The getmerge
command has been created specifically for merging files from HDFS into a single file on local file system.
This command is very useful to download the output of a MapReduce job, which could have generated multiple part-* files and combine them into a single file locally, which you can use for other operations (for e.g. put it in an Excel sheet for presentation).
Answers to your questions:
If the destination file system does not have enough space, then IOException is thrown. The getmerge
internally uses IOUtils.copyBytes()
(see IOUtils.copyBytes()) function to copy one file at a time from HDFS to local file. This function throws IOException
whenever there is an error in the copy operation.
This command is on similar lines as hdfs fs -get
command which gets the file from HDFS to local file system. Only difference is hdfs fs -getmerge
merges multiple files from HDFS to local file system.
If you want to merge multiple files in HDFS, you can achieve it using copyMerge()
method from FileUtil
class (see FileUtil.copyMerge()).
This API copies all files in a directory to a single file (merges all the source files).