hadoop fs -rm -skipTrash doesn't work
Asked Answered
A

2

6

I copied some files from a directory to directory using

hadoop distcp -Dmapreduce.job.queuename=adhoc /user/comverse/data/$CURRENT_DATE_NO_DASH_*/*rcr.gz /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_rcr/

I stopped the scipt before it finished and the remained a lot of .distcp.tmp.attempt and files that fnished moving in the dst directory

Now I want to clean the dst directory. After running

hadoop fs -rm -skipTrash /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/*

most of the files were deleted, but some remained(at least that's what HUE shows). The strange thing is, every time I run hadoop fs -rm -skipTrash, according to HUE, the number of remaining files changes to more or less.

I tried

hadoop fs -ls /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/

and saw that some of the files that should be deleted were still there. Then I run

hadoop fs -rm -skipTrash /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/*

a dozen more times and there were always more files to delete(There still are). What is happening?

ALSO

Each time I refresh the page in hue, the number of files grows. HALP.

EDIT

It seems that stopping distcp in the command line doesn't actually kill the job. That was the reason.

Anticipative answered 9/8, 2017 at 15:36 Comment(3)
How are you stopping from command line ?Lest
Can you share distcp submit log ?Lest
@San Ctrl + c for stopping. I can't.Anticipative
L
5

Ctrl + C doesn't kill the yarn application. distcp uses MapReduce model to copy data. When you run distcp command it submit yarn application to run on hadoop to copy data. You need to kill the yarn application to stop distcp copy process.

Command to kill yarn application:

yarn application -kill <application_id>

Lest answered 11/8, 2017 at 4:57 Comment(0)
H
11

You could use this "-R":

This remove all the file from your hdfs location.

hadoop fs -rm -R -skipTrash /apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin04/file_mta/*
Hagood answered 11/8, 2017 at 5:45 Comment(0)
L
5

Ctrl + C doesn't kill the yarn application. distcp uses MapReduce model to copy data. When you run distcp command it submit yarn application to run on hadoop to copy data. You need to kill the yarn application to stop distcp copy process.

Command to kill yarn application:

yarn application -kill <application_id>

Lest answered 11/8, 2017 at 4:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.