It is very simple to read a standard CSV file, for example:
val t = spark.read.format("csv")
.option("inferSchema", "true")
.option("header", "true")
.load("file:///home/xyz/user/t.csv")
It reads a real CSV file, something as
fieldName1,fieldName2,fieldName3
aaa,bbb,ccc
zzz,yyy,xxx
and t.show
produced the expected result.
I need the inverse, to write standard CSV file (not a directory of non-standard files).
It is very frustrating not to see the inverse result when write
is used. Maybe some other option or some kind of format (" REAL csv please! ")
exists.
NOTES
I am using Spark v2.2 and running tests on Spark-shell.
The "syntatical inverse" of read is write, so is expected to produce same file format with it. But the result of
t.write.format("csv").option("header", "true").save("file:///home/xyz/user/t-writed.csv")
is not a CSV file of rfc4180 standard format, as the original t.csv
,
but a t-writed.csv/
folder with the file
part-00000-66b020ca-2a16-41d9-ae0a-a6a8144c7dbc-c000.csv.deflate _SUCCESS
that seems a "parquet", "ORC" or other format.
Any language with a complete kit of things that "read someting" is able to "write the something", it is a kind of orthogonality principle.
Similar that not solves
Similar question or links that not solved the problem, perhaps used a incompatible Spark version, or perhaps spark-shell a limitation to use it. They have good clues for experts:
This similar question pointed by @JochemKuijpers: I try suggestion but obtain same ugly result.
This link say that there are a solution (!), but I can't copy/paste
saveDfToCsv()
in my spark-shell ("error: not found: type DataFrame"), some clue?
simple small and standard CSV file
<-- there's no such thing... A CSV file is simple, for humans. It is, basically, uncompressed text, so, can't be small. And there's no standard CSV. – Tomfooleryvery simple (one line)
-> Note that putting all your code on one line does not make it more simple. Typically it will be harder to read, understand and reason about, instead of easier if you create lines with more than one statement or function call on it. – Dodgsont.write.option("header", "true").csv("file:///C:/out.csv")
. – Kafir.write.format("csv")
be unable to generate something that can in turn be re-read by.read.format("csv")
. – Accustom