Storing data to SequenceFile from Apache Pig
Asked Answered
C

2

9

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader:

REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

log = LOAD '/data/logs' USING SequenceFileLoader AS (...)

Is there also a library out there that would allow writing to Hadoop sequence files from Pig?

Cinthiacintron answered 11/3, 2010 at 9:52 Comment(0)
M
2

It's just a matter of implementing a StoreFunc to do so.

This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.

The "Hadoop expansion pack" Twitter is about to open source open-sourced at github, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same -- you already have those for sequence files, obviously). Check it out if you need examples of how to do some of the less trivial stuff. It should be fairly straightforward though.

Myramyrah answered 12/3, 2010 at 12:24 Comment(0)
S
2

This seemed to work for me. https://github.com/kevinweil/elephant-bird/pull/73

Swanhilda answered 31/5, 2012 at 22:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.