Append data to existing file in HDFS Java
Asked Answered
C

3

24

I'm having trouble to append data to an existing file in HDFS. I want that if the file exists then append a line, if not, create a new file with the name given.

Here's my method to write into HDFS.

if (!file.exists(path)){
   file.createNewFile(path);
}

FSDataOutputStream fileOutputStream = file.append(path); 
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(fileOutputStream));
br.append("Content: " + content + "\n");
br.close();

Actually this method writes into HDFS and create a file but as I mention is not appending.

This is how I test my method:

RunTimeCalculationHdfsWrite.hdfsWriteFile("RunTimeParserLoaderMapperTest2", "Error message test 2.2", context, null);

The first param is the name of the file, the second the message and the other two params are not important.

So anyone have an idea what I'm missing or doing wrong?

Cohdwell answered 10/4, 2014 at 19:20 Comment(3)
First thing which you need to know that hdfs is one time write file system. We cannot append or overwrite into hdfs. However, we can read as many times as we can. Please go through Hadoop :The definitive Guide book for this.Skive
What is the type of the variable file?Kahle
Check out slideshare.net/dataera/inside-hdfs-appendNatividadnativism
Z
43

Actually, you can append to a HDFS file:

From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.

I checked HDFS sources, there is DistributedFileSystem#append method:

 FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException

For details, see presentation.

Also you can append through command line:

hdfs dfs -appendToFile <localsrc> ... <dst>

Add lines directly from stdin:

echo "Line-to-add" | hdfs dfs -appendToFile - <dst>
Zoril answered 27/5, 2015 at 12:19 Comment(1)
Fantastic answer indeed. +1 for the source code snippet.Gaylagayle
B
9

Solved..!!

Append is supported in HDFS.

You just have to do some configurations and simple code as shown below :

Step 1: set dfs.support.append as true in hdfs-site.xml :

<property>
   <name>dfs.support.append</name>
   <value>true</value>
</property>

Stop all your daemon services using stop-all.sh and restart it again using start-all.sh

Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :

Through command line :

./hdfs dfs -setrep -R 1 filepath/directory

Or you can do the same at run time through java code:

fsShell.setrepr((short) 1, filePath);  

Step 3 : Code for Creating/appending data into the file :

public void createAppendHDFS() throws IOException {
    Configuration hadoopConfig = new Configuration();
    hadoopConfig.set("fs.defaultFS", hdfsuri);
    FileSystem fileSystem = FileSystem.get(hadoopConfig);
    String filePath = "/test/doc.txt";
    Path hdfsPath = new Path(filePath);
    fShell.setrepr((short) 1, filePath); 
    FSDataOutputStream fileOutputStream = null;
    try {
        if (fileSystem.exists(hdfsPath)) {
            fileOutputStream = fileSystem.append(hdfsPath);
            fileOutputStream.writeBytes("appending into file. \n");
        } else {
            fileOutputStream = fileSystem.create(hdfsPath);
            fileOutputStream.writeBytes("creating and writing into file\n");
        }
    } finally {
        if (fileSystem != null) {
            fileSystem.close();
        }
        if (fileOutputStream != null) {
            fileOutputStream.close();
        }
    }
}

Kindly let me know for any other help.

Cheers.!!

Bringhurst answered 6/5, 2017 at 20:4 Comment(2)
what is fShell?Tresa
Its fsShell , was a typoBringhurst
K
2

HDFS does not allow append operations. One way to implement the same functionality as appending is:

  • Check if file exists.
  • If file doesn't exist, then create new file & write to new file
  • If file exists, create a temporary file.
  • Read line from original file & write that same line to temporary file (don't forget the newline)
  • Write the lines you want to append to the temporary file.
  • Finally, delete the original file & move(rename) the temporary file to the original file.
Kahle answered 11/4, 2014 at 18:38 Comment(2)
Ok, actually I modify the hdfs-site.xml adding tow properties and it works for me, this are the two properties that I used: <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property>Cohdwell
just tested the other response, and yours was tied, so I had to test and make sure -appendToFile actually worksSquelch

© 2022 - 2024 — McMap. All rights reserved.