Try
df = spark.read.parquet("/path/to/infile.parquet")
df.write.csv("/path/to/outfile.csv")
Relevant API documentation:
Both /path/to/infile.parquet
and /path/to/outfile.csv
should be locations on the hdfs filesystem. You can specify hdfs://...
explicitly or you can omit it as usually it is the default scheme.
You should avoid using file://...
, because a local file means a different file to every machine in the cluster. Output to HDFS instead then transfer the results to your local disk using the command line:
hdfs dfs -get /path/to/outfile.csv /path/to/localfile.csv
Or display it directly from HDFS:
hdfs dfs -cat /path/to/outfile.csv