Maven dependencies for Hadoop: MiniDFSCluster & MiniMRCluster
Asked Answered
A

2

8

I want to implement a maven project, that helps me unit test a Hadoop MapReduce job. My biggest problem is defining the Maven dependencies to be able to make use of the test classes: MiniDFSCluster & MiniMRCluster.

I am using Hadoop 2.4.1. Any ideas?

Adley answered 3/7, 2014 at 12:44 Comment(0)
A
3

Guess I figured it out. In your Maven pom file, first add a new repository:

<repositories>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>

Then add the following to your project dependencies

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.1</version>
</dependency>
<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-auth</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-test</artifactId>
    <version>2.0.0-mr1-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.3.0</version>
    <classifier>tests</classifier>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.3.0</version>
    <classifier>tests</classifier>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>2.0.0-mr1-cdh4.3.0</version>
</dependency>

In case someone is interested to get the whole project (unit test for the famous WordCount MapReduce job, I am willing to share it)

Adley answered 3/7, 2014 at 12:53 Comment(1)
It is sufficient to include only hadoop-minicluster: <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-minicluster</artifactId><version>2.7.0</version></dependency>Petras
R
6

In case someone else is still searchinf for an answer:

MiniMRCluster is now deprecated.

You can get MiniDFSCluster and MiniMRCluster in the dependency (shown for Gradle)

compile group: 'org.apache.hadoop', name: 'hadoop-minicluster', version: '2.7.2'

The dependency is basically just a pom file that lists out the dependencies in this package. For those who want to look this up, MiniDFSCluster is in the artifact hadoop-hdfs:tests

You don't have to use the dependencies from the Cloudera repository

Russellrusset answered 24/9, 2016 at 10:49 Comment(0)
A
3

Guess I figured it out. In your Maven pom file, first add a new repository:

<repositories>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>

Then add the following to your project dependencies

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.1</version>
</dependency>
<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-auth</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-test</artifactId>
    <version>2.0.0-mr1-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.3.0</version>
    <classifier>tests</classifier>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.3.0</version>
    <classifier>tests</classifier>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>2.0.0-mr1-cdh4.3.0</version>
</dependency>

In case someone is interested to get the whole project (unit test for the famous WordCount MapReduce job, I am willing to share it)

Adley answered 3/7, 2014 at 12:53 Comment(1)
It is sufficient to include only hadoop-minicluster: <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-minicluster</artifactId><version>2.7.0</version></dependency>Petras

© 2022 - 2024 — McMap. All rights reserved.