I'm connected to the cluster using ssh
and I send the program to the cluster using
spark-submit --master yarn myProgram.py
I want to save the result in a text file and I tried using the following lines:
counts.write.json("hdfs://home/myDir/text_file.txt")
counts.write.csv("hdfs://home/myDir/text_file.csv")
However, none of them work. The program finishes and I cannot find the text file in myDir
. Do you have any idea how can I do this?
Also, is there a way to write directly to my local machine?
EDIT: I found out that home
directory doesn't exist so now I save the result as:
counts.write.json("hdfs:///user/username/text_file.txt")
But this creates a directory named text_file.txt
and inside I have a lot of files with partial results inside. But I want one file with the final result inside. Any ideas how I can do this ?
hdfs dfs -ls hdfs://home/myDir
? – Impetus/home/myDir
to write to – Impetus-ls: java.net.UnknownHostException: home
so I guess this folder doesn't exist. Usually when I what to save the file in with directory should I put it ? – Mitigate/home
is Linux user directory.... In HDFS, it's/user
. – ImpetusUnknownHostException
is because your path is wrong. It should behdfs:///home/myDir
, or better removehdfs://
from everywhere, as mentioned – Impetus