How to use external data with Elastic MapReduce
Asked Answered
P

3

1

From Amazon's EMR FAQ:

Q: Can I load my data from the internet or somewhere other than Amazon S3?

Yes. Your Hadoop application can load the data from anywhere on the internet or from other AWS services. Note that if you load data from the internet, EC2 bandwidth charges will apply. Amazon Elastic MapReduce also provides Hive-based access to data in DynamoDB.

What are the specifications for loading data from external (non-S3) sources? There seems to be a dearth of resources around this option and doesn't appear to be documented in any form.

Promote answered 6/6, 2012 at 16:41 Comment(2)
Not sure i understand the question, are you saying "how do i load data from the internet into an EMR based Hadoop instance?"Smalls
@ChrisWhite yeah, that's exactly what I'm asking. EMR allows data to come from places other than S3, but no explanation of how to accomplish that.Promote
H
2

If you want to do it "a hadoop way" you should implement DFS over your data source, or to put referances to your source URLs into some file, which will be input for the MR job.
In the same time hadoop is about moving code to data. Even EMR over S3 is not ideal in this perspectice - EC2 and S3 are different cluster. So it is hard to imegine effective MR procesing if datasource is phisically outside of the data center.

Hockey answered 6/6, 2012 at 21:24 Comment(0)
T
0

Basically what Amazon is saying that programatically you can access any content from internet or any other source via your code. For example you can access a Couch database instance via any HTTP based client APIs.

Teapot answered 19/6, 2012 at 20:48 Comment(0)
A
0

I know that Cassandra package for java has one source package named org.apache.cassandra.hadoop and there are two classes in it that are needed for getting info from Cassandra when you are running the AWS Elastic MapReduce.

Essential classes: ColumnFamilyInputFormat.java and ConfigHelper.java

Go to this link to see an example of what I'm talking about.

Actuate answered 12/12, 2012 at 12:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.