How to copy file from HDFS to the local file system
Asked Answered
M

9

176

How to copy file from HDFS to the local file system . There is no physical location of a file under the file , not even directory . how can i moved them to my local for further validations.i am tried through winscp .

Maggy answered 24/7, 2013 at 15:3 Comment(0)
C
287
  1. bin/hadoop fs -get /hdfs/source/path /localfs/destination/path
  2. bin/hadoop fs -copyToLocal /hdfs/source/path /localfs/destination/path
  3. Point your web browser to HDFS WEBUI(namenode_machine:50070), browse to the file you intend to copy, scroll down the page and click on download the file.
Closure answered 24/7, 2013 at 15:11 Comment(7)
perfect tariq , i got the it ,There is no physical location of a file under the file , not even directory . bin/hadoop dfs -ls /use/hadoop/myfolder i can view the file , From i got the info as To inspect the file, you can copy it from HDFS to the local file system , so i though i can moved them from winscpMaggy
once again i need to mention tariq , thanks a lot for contributing you time and knowledge . thanks a lot . u did support a lot , this gives a lot of confidence for a new bie like me .Maggy
I see. You can actually use hdfs cat command if you wish to see the file's content or open the file on the webui. This will save you from downloading the file to your local fs. You are welcome. And if you are 100% satisfied with the answers to your questions you can mark them so that others can benefit from it.. Not for just this one, but in general.Closure
Just to add to my lat comment, if it is a binary file, cat won't show you the actual content. To view the content of a binary file you can use : bin/hadoop fs -text /path/to/fileClosure
i tried using xml file , https://mcmap.net/q/144367/-diskerrorexception-on-slave-machine-hadoop-multinode/2499617 , got Diskerror on slave can you please adviseMaggy
It seems to be a bug(fixed). See the answer.Closure
It there a possibility to specify the modification/creation date of the files your copy?Borlase
C
41

In Hadoop 2.0,

hdfs dfs -copyToLocal <hdfs_input_file_path> <output_path>

where,

  • hdfs_input_file_path maybe obtained from http://<<name_node_ip>>:50070/explorer.html

  • output_path is the local path of the file, where the file is to be copied to.

  • you may also use get in place of copyToLocal.

Crispas answered 8/8, 2016 at 9:20 Comment(0)
F
26

In order to copy files from HDFS to the local file system the following command could be run:

hadoop dfs -copyToLocal <input> <output>

  • <input>: the HDFS directory path (e.g /mydata) that you want to copy
  • <output>: the destination directory path (e.g. ~/Documents)

Update: Hadoop is deprecated in Hadoop 3

use hdfs dfs -copyToLocal <input> <output>

Ferd answered 16/9, 2014 at 10:45 Comment(3)
This does not seem to work for me. It always says <input> file not found. I am using cloudera's VM instance which has cent os 6.4Colorant
@Colorant Are you sure the file is actually there? Can you browse there via hadoop fs -ls ?Karlise
Just use hadoop dfs -get <input> <output>, it'll work. To list files, use hadoop dfs -ls <path>Bosco
U
7

you can accomplish in both these ways.

1.hadoop fs -get <HDFS file path> <Local system directory path>
2.hadoop fs -copyToLocal <HDFS file path> <Local system directory path>

Ex:

My files are located in /sourcedata/mydata.txt I want to copy file to Local file system in this path /user/ravi/mydata

hadoop fs -get /sourcedata/mydata.txt /user/ravi/mydata/
Uranian answered 6/2, 2017 at 19:46 Comment(0)
B
7

If your source "file" is split up among multiple files (maybe as the result of map-reduce) that live in the same directory tree, you can copy that to a local file with:

hadoop fs -getmerge /hdfs/source/dir_root/ local/destination
Brownfield answered 7/5, 2019 at 17:16 Comment(2)
This should be accepted. This is what most people are looking for, not a split up file.Drool
This would be the best answer to be honest. Usually all HDFS files/tables are separated like 0000_0, 0001_0 in those directory. -getmerge will merge all those and put in into 1 files in local directory. Kudos to @BrownfieldNiklaus
B
3

This worked for me on my VM instance of Ubuntu.

hdfs dfs -copyToLocal [hadoop directory] [local directory]

Boxhaul answered 6/3, 2018 at 15:0 Comment(0)
C
1

1.- Remember the name you gave to the file and instead of using hdfs dfs -put. Use 'get' instead. See below.

$hdfs dfs -get /output-fileFolderName-In-hdfs

Coloring answered 25/2, 2021 at 23:57 Comment(0)
S
0

if you are using docker you have to do the following steps:

  1. copy the file from hdfs to namenode (hadoop fs -get output/part-r-00000 /out_text). "/out_text" will be stored on the namenode.

  2. copy the file from namenode to local disk by (docker cp namenode:/out_text output.txt)

  3. output.txt will be there on your current working directory

Scherer answered 3/9, 2019 at 13:44 Comment(0)
B
-3
bin/hadoop fs -put /localfs/destination/path /hdfs/source/path 
Bryan answered 29/1, 2016 at 2:46 Comment(1)
hdfs dfs -put is a command to push files from to local FS to HDFS. hdfs dfs -get is the right optionSuzansuzann

© 2022 - 2024 — McMap. All rights reserved.