Write a file in hdfs with Java
Asked Answered
M

4

59

I want to create a file in HDFS and write data in that. I used this code:

Configuration config = new Configuration();     
FileSystem fs = FileSystem.get(config); 
Path filenamePath = new Path("input.txt");  
try {
    if (fs.exists(filenamePath)) {
        fs.delete(filenamePath, true);
    }

    FSDataOutputStream fin = fs.create(filenamePath);
    fin.writeUTF("hello");
    fin.close();
}

It creates the file, but it does't write anything in it. I searched a lot but didn't find anything. What is my problem? Do I need any permission to write in HDFS?

Thanks.

Maquis answered 14/4, 2013 at 15:34 Comment(2)
This code creates a HDFS file with a single partition, can we set the number of partitions for input.txt?Likelihood
how to import FileSystem (what is full class path ) ?Neral
F
75

an alternative to @Tariq's asnwer you could pass the URI when getting the filesystem

import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
import org.apache.hadoop.fs.Path
import org.apache.hadoop.util.Progressable
import java.io.BufferedWriter
import java.io.OutputStreamWriter

Configuration configuration = new Configuration();
FileSystem hdfs = FileSystem.get( new URI( "hdfs://localhost:54310" ), configuration );
Path file = new Path("hdfs://localhost:54310/s2013/batch/table.html");
if ( hdfs.exists( file )) { hdfs.delete( file, true ); } 
OutputStream os = hdfs.create( file,
    new Progressable() {
        public void progress() {
            out.println("...bytes written: [ "+bytesWritten+" ]");
        } });
BufferedWriter br = new BufferedWriter( new OutputStreamWriter( os, "UTF-8" ) );
br.write("Hello World");
br.close();
hdfs.close();
Furred answered 23/7, 2013 at 19:41 Comment(6)
How to get variable 'bytesWritten'?Japhetic
Try looking at the OutputStream docs? ex: docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.htmlFurred
import statements would be helpful... Where is Configuration coming from in particular?Utgardloki
Configuration and many other come from org.apache.hadoop.* from the 'org.apache.hadoop:hadoop-common:jar:X.X.X' library that you pickHadwin
import statements incase anyone is wondering : import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.conf.Configuration import java.net.URI import org.apache.hadoop.fs.Path import org.apache.hadoop.util.Progressable import java.io.BufferedWriter import java.io.OutputStreamWriter Interpol
A general advice to be careful with using FileSystem.close() call. It is possible that the FS is cached somewhere in the system (see description of FileSystem.get()) and there will be issues when something tries to use that cache after it was explicitly closed like that.Kuhn
B
24

Either define the HADOOP_CONF_DIR environment variable to your Hadoop configuration folder or add the following 2 lines in your code :

config.addResource(new Path("/HADOOP_HOME/conf/core-site.xml"));
config.addResource(new Path("/HADOOP_HOME/conf/hdfs-site.xml"));

If you don't add this, your client will try to write to the local FS, hence resulting into the permission denied exception.

Bog answered 14/4, 2013 at 22:50 Comment(0)
J
1

This should do the trick

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.*;
import java.nio.charset.StandardCharsets;


public static void writeFileToHDFS() throws IOException {
        Configuration configuration = new Configuration();
        configuration.set("fs.defaultFS", "hdfs://localhost:9000");
        configuration.addResource(new Path("/HADOOP_HOME/conf/core-site.xml"));
        configuration.addResource(new Path("/HADOOP_HOME/conf/hdfs-site.xml"));
        FileSystem fileSystem = FileSystem.get(configuration);
        //Create a path
        String fileName = "input.txt";
        Path hdfsWritePath = new Path("/user/yourdesiredpath/" + fileName);
        FSDataOutputStream fsDataOutputStream = fileSystem.create(hdfsWritePath,true);

        BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(fsDataOutputStream,StandardCharsets.UTF_8));
        bufferedWriter.write("Java API to write data in HDFS");
        bufferedWriter.close();
        fileSystem.close();
    }
Joselynjoseph answered 9/2, 2021 at 9:59 Comment(0)
I
-2

Please try the below approach.

FileSystem fs = path.getFileSystem(conf);
SequenceFile.Writer inputWriter = new SequenceFile.Writer(fs, conf, path, LongWritable.class, MyWritable.class);
inputWriter.append(new LongWritable(uniqueId++), new MyWritable(data));
inputWriter.close();
Implantation answered 14/4, 2013 at 16:33 Comment(2)
user just wants to write a file, not specifically Sequence file.Bog
HAve you included the job configuration stubs?Agnosia

© 2022 - 2024 — McMap. All rights reserved.