How to update a file in HDFS
Asked Answered
B

2

20

I know that HDFS is write once and read many times.
Suppose if i want to update a file in HDFS is there any way to do it ?

Thankyou in advance !

Buchner answered 24/8, 2016 at 17:59 Comment(0)
B
25

Option1:

If you just want to append to an existing file

  1. echo "<Text to append>" | hdfs dfs -appendToFile - /user/hduser/myfile.txt OR

  2. hdfs dfs -appendToFile - /user/hduser/myfile.txt and then type the text on the terminal. Once you are done typing then hit 'Ctrl+D'

Option2:

Get the original file from HDFS to the local filesystem, modify it and then put it back on HDFS.

  1. hdfs dfs -get /user/hduser/myfile.txt

  2. vi myfile.txt #or use any other tool and modify it

  3. hdfs dfs -put -f myfile.txt /user/hduser/myfile.txt

Buffer answered 25/8, 2016 at 4:57 Comment(2)
I'm aware this is almost a year old, but for anyone that may be looking for the answer - In option 2, putting a file that already exists will cause an error. You'll need to move/rename/remove the file from HDFS first. you can rename it with hdfs dfs -mv /home/hduser/myfile.txt /home/hduser/old_myfile.txtSquiggle
Or you can simply add -f flag for put command. Updated answer ;)Buffer
C
4

If you want to add lines, you must put another file and concatenate files:

hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile

To modify any portion of a file that is already written you have three options:

  1. Get file from hdfs and modify their content in local

    hdfs dfs -copyToLocal /hdfs/source/path /localfs/destination/path

    or

    hdfs dfs -cat /hdfs/source/path | modify...

  2. Use a processing technology to update as Map Reduce or Apache Spark, the result will appear as a directory of files and you will remove old files. It should be the best way.

  3. Install NFS or Fuse, both supports append operations.

    NFS Gateway

    Hadoop Fuse : mountableHDFS, helps allowing HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as ‘ls’, ‘cd’, ‘cp’, ‘mkdir’, ‘find’, ‘grep’

Chessboard answered 24/8, 2016 at 18:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.