Which nodejs library should I use to write into HDFS?
Asked Answered
M

2

8

I have a nodejs application and I want to write data into hadoop HDFS file system. I have seen two main nodejs libraries that can do it: node-hdfs and node-webhdfs. Someone have tried it? Any hints? Which one should I use in production?

I am inclined to use node-webhdfs since it uses WebHDFS REST API. node-hdfs seem to be a c++ binding.

Any help will be greatly appreciated.

Monster answered 5/1, 2014 at 1:34 Comment(0)
U
10

You may want to check out webhdfs library. It provides nice and straightforward (similar to fs module API) interface for WebHDFS REST API calls.

Writing to the remote file:

var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();

var localFileStream = fs.createReadStream('/path/to/local/file');
var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');

localFileStream.pipe(remoteFileStream);

remoteFileStream.on('error', function onError (err) {
  // Do something with the error
});

remoteFileStream.on('finish', function onFinish () {
  // Upload is done
});

Reading from the remote file:

var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();

var remoteFileStream = hdfs.createReadStream('/path/to/remote/file');

remoteFileStream.on('error', function onError (err) {
  // Do something with the error
});

remoteFileStream.on('data', function onChunk (chunk) {
  // Do something with the data chunk
});

remoteFileStream.on('finish', function onFinish () {
  // Upload is done
});
Ungava answered 13/2, 2014 at 10:42 Comment(2)
this works for me, be sure to use the latest stable version of nodejs (not the git repo, that caused me some issues)Berni
how do you find the path to the remote file?Epicure
G
5

Not good news!!!

Do not use node-hdfs. Although it seems promising, it is now two years obsolete. I've tried to compile it but it does not match the symbols of current libhdfs. If you want to use something like that you'll have to make your own nodejs binding.

You can use node-webhdfs but IMHO there's not much advantage on that. It is better to use an http nodejs lib to make your own requests. The hardest part here is try to hold the very async nature of nodejs, since you might want first to create a folder, and then after successfully create it, create a file and then, at last, write or append data. Everything through http requests that you must send and wait the for answer to then go on....

At least node-webhdfs might be a good reference to you take a look and start your own code.

Br, Fabio Moreira

Gowk answered 7/2, 2014 at 18:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.