DynamoDB InputFormat for Hadoop
Asked Answered
M

2

3

I have to process some data which is persisted in Amazon Dynamo DB using Hadoop map reduce.

I was searching over internet for Hadoop InputFormat for Dynamo DB and couldn't find it. I'm not familiar with Dynamo DB so I'm guessing there is some trick related to DynamoDB and Hadoop? If there is anywhere implementation of this Input Format could you please share it?

Marlea answered 22/10, 2012 at 21:22 Comment(0)
M
3

After a lot of searching I found DynamoDBInputFormat and DynamoDBOutputFormat in one of Amazon's libraries.

On amazon elastic map reduce there is library called hive-bigbird-handler which contains input and output format for dynamoDB. Full class names are: org.apache.hadoop.hive.dynamodb.write.DynamoDBOutputFormat and org.apache.hadoop.hive.dynamodb.read.DynamoDBInputFormat

I hope these classes will be useful to community.

Marlea answered 29/10, 2012 at 18:36 Comment(0)
A
1

Couldn't find an InputFormat which you could use directly in MapReduce. But, here is an article AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB (Guest Post) to run MarReduce jobs using Hive.

Alejandraalejandrina answered 23/10, 2012 at 5:2 Comment(4)
Thanks I know about that, but I want to perform specific data transformations using custom code which I can't implement in Hive.Marlea
How about using existing or creating a custom Hive UDF for the transformation?Alejandraalejandrina
Sorry but my business plan doesn't fit with custom UDF. I need to write custom Hadoop map reduce job. Also I figure out if this doesn't exits I will write DynamoDBInputFromat and post it here once I am done.Marlea
@Marlea have you found any implementation? I'm looking for the same. I have been able to download data from the table with custom mapreduce implementation, but I'm unable to apply scan filters (start from row id -> end row id)Expertize

© 2022 - 2024 — McMap. All rights reserved.