Deleting file/folder from Hadoop
Asked Answered
E

7

17

I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails:

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.208.42.127:9000/home/hadoop/temp-output-s3copy already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:905)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:905)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:879)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1316)
    at com.valtira.datapipeline.stream.CloudFrontStreamLogProcessors.main(CloudFrontStreamLogProcessors.java:216)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

How can I delete that folder from Hadoop?

Enphytotic answered 28/5, 2013 at 16:47 Comment(0)
E
1

I contacted AWS support and it seemed that the problem was that the log files I was analyzing were very big and that created an issue with memory. I added to my pipeline definition "masterInstanceType" : "m1.xlarge" in the EMRCluster section and it worked.

Enphytotic answered 30/5, 2013 at 19:56 Comment(1)
This is the answer to your question but not the answer to the question's title.Actualize
A
53

When you say delete from Hadoop, you really mean delete from HDFS.

To delete something from HDFS do one of the two

From the command line:

  • deprecated way:

hadoop dfs -rmr hdfs://path/to/file

  • new way (with hadoop 2.4.1) :

hdfs dfs -rm -r hdfs://path/to/file

Or from java:

FileSystem fs = FileSystem.get(getConf());
fs.delete(new Path("path/to/file"), true); // delete file, true for recursive 
Asiatic answered 28/5, 2013 at 17:1 Comment(13)
path/to/file is "10.208.42.127:9000/home/hadoop/temp-output-s3copy"? Thanks!Enphytotic
I haven't tested it yet. My question is should I use "10.208.42.127:9000/home/hadoop/temp-output-s3copy" as path/to/file?Enphytotic
usually you just specify hdfs://home/hadoop/temp-output-s3copy, since files on hdfs are often replicated to several nodes. Are you doing this on a single node?Asiatic
Well if this folder is on HDFS then it should work. Though the path you gave makes me think its not on HDFS at all and instead is just a local folder. Are you doing this through command line or java?Asiatic
Im creating the pipeline through the command line, but my loganalyzer is done in JavaEnphytotic
So use the command line version hadoop dfs -rmr hdfs://home/hadoop/temp-output-s3copy. If that doesn't work, it's because it's not on the hdfs file system. If thats the case.. you can use hadoop dfs -rmr file://home/hadoop/temp-output-s3copy, or just the unix rm -rAsiatic
From Java, did you mean FileSystem fs = FileSystem.get(fs.getConf());? I added the fs.getConf()Enphytotic
It really depends on the hadoop api version. Just use whatever you need to get the current configuraiton, if thats fs.getConfg then use that.Asiatic
It's not a local folder, so I'm pretty sure it is in Hadoop. I'll try this and see what happens. Thanks!Enphytotic
So it worked the first time I run the EMRActivity. I run again using the same java class, same Pipeline configuration, but different dates and it doesn't work. I get the exact same error. The only difference that I see is in the numbers at hdfs://10.208.42.127:9000/home/hadoop/temp-output-s3copy already exists. Every new time I run the Pipeline, I get a different number. I do not know what that means. I was suggested to delete the output from S3, but it still failed.Enphytotic
I contacted AWS support and it seemed that the problem was that the log files I was analyzing were very big and that created an issue with memory. I added to my pipeline definition "masterInstanceType" : "m1.xlarge" in the EMRCluster section and it worked. ThanksEnphytotic
how can we achieve same with python ?Messere
org.apache.hadoop.fs.FileSystemHorror
U
15

To delete a file from hdfs you can use below given command :

hadoop fs -rm -r -skipTrash /path_to_file/file_name

To delete a folder from hdfs you can use below given command :

hadoop fs -rm -r -skipTrash /folder_name

You need to use -skipTrash option otherwise error will be prompted.

Uninterested answered 4/7, 2015 at 10:31 Comment(0)
N
7

With Scala:

val fs:FileSystem = FileSystem.get(new URI(filePath), sc.hadoopConfiguration);
fs.delete(new Path(filePath), true) // true for recursive

sc is the SparkContext

Nygaard answered 27/7, 2015 at 16:15 Comment(1)
Just what I was looking for: includes recursive flag and from sparkContext.Restrainer
P
2

To delete a file from hdfs use the command: hadoop fs -rm -r /FolderName

Phore answered 11/5, 2015 at 12:32 Comment(0)
E
1

I contacted AWS support and it seemed that the problem was that the log files I was analyzing were very big and that created an issue with memory. I added to my pipeline definition "masterInstanceType" : "m1.xlarge" in the EMRCluster section and it worked.

Enphytotic answered 30/5, 2013 at 19:56 Comment(1)
This is the answer to your question but not the answer to the question's title.Actualize
M
1

From the command line:

 hadoop fs -rm -r /folder
Mcquiston answered 12/5, 2014 at 19:15 Comment(0)
E
0

I use hadoop 2.6.0, the commande line 'hadoop fs -rm -r fileName.hib' works fine for deleting any hib file on my hdfs file sys

Enloe answered 10/8, 2015 at 20:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.