pyspark.sql.utils.AnalysisException: Parquet data source does not support void data type - McMap

About

pyspark.sql.utils.AnalysisException: Parquet data source does not support void data type

Asked 18/10, 2022 at 18:36 Answered 18/10, 2022 at 20:24

Solved apache-spark pyspark types parquet void

B

1

11

I am trying to add a column in my dataframe df1 in PySpark.

The code I tried:

import pyspark.sql.functions as F
df1 = df1.withColumn("empty_column", F.lit(None))

But I get this error:

pyspark.sql.utils.AnalysisException: Parquet data source does not support void data type.

Can anyone help me with this?

Bloxberg answered 18/10, 2022 at 18:36 Comment(0)

H

14

Instead of just F.lit(None), use it with a cast and a proper data type. E.g.:

F.lit(None).cast('string')

F.lit(None).cast('double')

When we add a literal null column, it's data type is void:

from pyspark.sql import functions as F
spark.range(1).withColumn("empty_column", F.lit(None)).printSchema()
# root
#  |-- id: long (nullable = false)
#  |-- empty_column: void (nullable = true)

But when saving as parquet file, void data type is not supported, so such columns must be cast to some other data type.

Helterskelter answered 18/10, 2022 at 20:24 Comment(0)

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2025 — McMap. All rights reserved.