I have a huge DynamoDB table that I want to analyze to aggregate data that is stored in its attributes. The aggregated data should then be processed by a Java application. While I understand the really basic concepts behind MapReduce, I've never used it before.
In my case, let's say that I have a customerId
and orderNumbers
attribute in every DynamoDB item, and that I can have more than one item for the same customer. Like:
customerId: 1, orderNumbers: 2
customerId: 1, orderNumbers: 6
customerId: 2, orderNumbers: -1
Basically I want to sum the orderNumbers for each customerId, and then execute some operations in Java with the aggregate.
AWS Elastic MapReduce could probably help me, but I don't understand how do I connect a custom JAR with DynamoDB. My custom JAR probably needs to expose both a map
and reduce
functions, where can I find the right interface to implement?
Plus I'm a bit confused by the docs, it seems like I should first export my data to S3 before running my custom JAR. Is this correct?
Thanks