Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

Asked 2/6, 2014 at 12:16 Answered 25/6, 2014 at 12:4

Solved scala hadoop mapreduce cascading scalding

How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.

Blues answered 2/6, 2014 at 12:16 Comment(0)

There is TemplatedTsv in Scalding (from version 0.9.0rc16 and up), exactly same as Cascading TemplateTsv.

Tsv(args("input"), ('COUNTRY, 'GDP))
.read
.write(TemplatedTsv(args("output"), "%s", 'COUNTRY))
// it will create a directory for each country under "output" path in Hadoop mode.

Stith answered 25/6, 2014 at 12:4 Comment(4)

This looks even more flexible than what I requested!! Thanks. Could you say from which Scalding version this is? Is it 0.10.0 and above? or 0.9.0? – Blues 26/6, 2014 at 14:26

From codebase, it looks to be available from 0.9.0rc16 version. – Stith 27/6, 2014 at 12:18

@Stith is there any way to drop the fields used in the template string? In your example, basically I want the resulting files to have only the 'GDP' field in the resulting output. – Umbles 14/10, 2014 at 13:48

@Umbles That is really good question; but, I do not know if it is possible with current TemplatedTsv implementation. However, you can make another, your own, MyTemplatedTsv like here github.com/twitter/scalding/blob/0.11.0/scalding-core/src/main/… and add "override val fields = Fields.ALL" and specify the fields to be written when calling that tap. Could you please reply here if you test that? – Stith 15/10, 2014 at 13:1

Use MultipleOutputFormat and extrapolate from these other SO questions to write a custom output class using the output format: Create Scalding Source like TextLine that combines multiple files into single mappers, Compress Output Scalding / Cascading TsvCompressed

Blues answered 2/6, 2014 at 12:47 Comment(0)

This suggestion on the Cascading User group suggests to use Cascading TemplateTap. Not sure how to connect this to Scalding though.

Rayon answered 2/6, 2014 at 18:27 Comment(1)

That certainly looks promising, care to provide Scalding code for people's copy and paste needs? :) – Blues 2/6, 2014 at 20:56

Recommended topics

Hot tags