I use a sqlContext.read.parquet
function in PySpark
to read the parquet
files everyday. The data has a timestamp
column. They changed the timestamp field from 2019-08-26T00:00:13.600+0000
to 2019-08-26T00:00:13.600Z
. It reads fine in Databricks, but it gives an Illegal Parquet type: INT64 (TIMESTAMP_MICROS)
error while I'm trying to read it over a spark cluster. How do I read this new column using the read.parquet
function itself?
Currently I use: from_unixtime(unix_timestamp(ts,"yyyy-MM-dd HH:mm:ss.SSS"),"yyyy-MM-dd")
as ts to convert the 2019-08-26T00:00:13.600+0000
to a 2019-08-26
format.
How do I convert 2019-08-26T00:00:13.600Z
to 2019-08-26
?