How to stop hive/pig install in Amazon Data Pipeline?
Asked Answered
B

1

8

I don't need Hive or Pig, and Amazon Data Pipeline by default installs them on any EMR cluster it spins up. This makes testing take longer than it should. Any ideas on how to disable to install?

Blastocoel answered 17/1, 2014 at 18:51 Comment(0)
N
1

This is not possible as of today.

The only workaround would be to launch a small EMR cluster that you use for testing (like with single master - m1.small). Then use it with 'workergroup' rather than 'runsOn'.

Depending on type of activities you want to use, the workergroup field might or might not be supported. But you can always wrap everything in a script (python, shell or blah) and use it with ShellCommandActivity.


Update (correctly reminded by ChristopherB):

From 3.x AMI version, Hive and Pig is bundled in the AMI itself. So the steps do not pull any new packages from S3 but only activate the daemons on master node. So unless you are worried about them consuming your instance resources (CPU, memory etc), it should be okay. They would not take noticable time to run.

Natterjack answered 16/2, 2015 at 18:52 Comment(1)
For EMR AMI 3.x and later the steps to add these result in no operation since with these AMIs the software is already preloaded for Pig and Hive.Foursquare

© 2022 - 2024 — McMap. All rights reserved.