Is there any way to download a HDFS file using WebHDFS REST API? [closed]
Asked Answered
H

1

14

Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content.

curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o ~/demofile.txt

Is there any API that will allow me to download the file directly without having to open it?I went through the official document and tried Google as well, but could not find anything. Could somebody point me in the right direction or provide me some pointers?

Thank you so much for your valuable time.

Haller answered 31/5, 2013 at 20:9 Comment(9)
What is wrong with the approach you're describing? You'll need to read the file at some point anyway if you want to download it locally.Kaki
Thank you for the reply sir. I just want to download the file as it is and keep it into a directory on my local FS as of now. Reading the file is not my intention at this moment. Also, if I follow the above approach I would end up with a file which includes the header as well "HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 218 Server: Jetty(6.1.26)"Haller
The webHDFS API is for programmatic use, so using OPEN is as close as it gets if you want to use it... you still need some code to create the file.Hyoscyamus
Not really sure how exactly this question is off-topic. Discussing APIs is what SO is meant for.Haller
The API call looks perfectly fine - unless the question is updated with a good reason for why anyone should care, everyone probably is feeling like the OP is wasting our time with an useless question.Nelsonnema
Sometimes reading the question with an open mind really helps. I have very clearly mentioned in my question that the API call I have shown here is the closest and certainly not exactly what I intend to achieve. I have been an active contributor on SO for years and I very well know what exactly wasting time is. 'Fine' is a relative word. What's fine with you might not be so fine with me. You can look at my last comment against the answer to verify that. The way I was doing it was not 'fine' and I had to change something to make it work.Haller
And as far as the fact whether or not you wish to help fellow SO users is concerned, it's totally your call. 8 Upvotes and 3 stars are a fine indication of how useful/useless this question is.Haller
The headers are included, because the -i flag includes headers. Remove that, and you should have the "reference implementation".Haft
@Haller I'm flagging this to be an open question. As a Hadoop administrator, these topics are not always cut and dry approaches, and most of the default documentation leaves out key elements or details. This post should be open for future answers and discussion around the webhdfs API (10k views says it all)Baldric
K
11

You could probably use the DataNode API for this (default on port 50075), it supports a streamFile command which you could take advantage of. Using wget this would look something like:

wget http://$datanode:50075/streamFile/demofile.txt -O ~/demofile.txt

Note that this command needs to be executed on the datanode itself, not on the namenode !

Alternatively, if you don't know which datanode to hit you could ask the jobtracker and it will redirect you to the right datanode with this URL:

http://$namenode:50070/data/demofile.txt
Kaki answered 31/5, 2013 at 22:22 Comment(13)
thank you for the reply sir. i had tried this once but it was giving me "ERROR 500: File does not exist: /.".Haller
Can you show me what command you ran?Kaki
wget localhost:50075/streamFile?filename=/demofile.txt -O ~/demofile.txtHaller
What happens if you do filename=demofile.txt instead of filename=/demofile.txt ?Kaki
i'm getting the same errorHaller
Weird, i'll try this this Monday and let you know what I find, if the file exists this should download the file for you.Kaki
exactly..i was expecting the same..i'll also try and let you know if something clicks..thanks again.Haller
and the file does exist with proper permissions. i have checked that twice.Haller
@Haller Edited my answer with more details, and it looks like you actually don't use "filename=", but put the path file directly after streamFile.Kaki
thank you so very much sir. we actually don't need "-O ~/demofile.txt". simply running "wget http://$datanode:50075/streamFile/demofile.txt" would do the trick. thanks again.Haller
Is there anyway we can download multiple files without knowing the file names only knowing folder name ?Dulin
Do i need to give a user password reading files with webhdfs in java ?Sextuplicate
As of Hadoop 3.0.0 port 50075 has been moved to 9870.Laquanda

© 2022 - 2024 — McMap. All rights reserved.