distcp Questions

0

I run a distcp command to copy the hdfs location of a table to another cluster. The copy is scheduled to run every 8 hours. I run the 'msck repair table' command but not always after the copy. I ha...
Intubate asked 21/10, 2021 at 20:29

3

Solved

I have a huge bucket of S3files that I want to put on HDFS. Given the amount of files involved my preferred solution is to use 'distributed copy'. However for some reason I can't get hadoop distcp ...
Hydrosphere asked 23/11, 2017 at 13:16

3

Solved

On our cluster we have set up dynamic resource pools. The rules are set so that first yarn will look at the specified queue, then to the username, then to primary group ... However with a distcp ...
Mauriciomaurie asked 5/11, 2015 at 9:25

1

Solved

I have the following folders in HDFS : hdfs://x.x.x.x:8020/Air/BOOK/AE/DOM/20171001/2017100101 hdfs://x.x.x.x:8020/Air/BOOK/AE/INT/20171001/2017100101 hdfs://x.x.x.x:8020/Air/BOOK/BH/INT/20171001/...
Nonferrous asked 19/10, 2017 at 15:20

2

Solved

I copied some files from a directory to directory using hadoop distcp -Dmapreduce.job.queuename=adhoc /user/comverse/data/$CURRENT_DATE_NO_DASH_*/*rcr.gz /apps/hive/warehouse/arstel.db/fair_usage/...
Anticipative asked 9/8, 2017 at 15:36

1

I like to copy data from our hadoop cluster (on premise) to s3. I can do it unencrypted. I can also run s3cmd put with client side encryption. How do I do distcp with client side encryption ?
Trust asked 17/10, 2014 at 19:5

1

Solved

I am using aws .net sdk to run a s3distcp job to EMR to concatenate all files in a folder with --groupBy arg. But whatever "groupBy" arg I have tried, it failed all the time or just copy the files ...
Malady asked 14/7, 2016 at 12:23
1

© 2022 - 2024 — McMap. All rights reserved.