I'm using Jupyter Notebook with PySpark. Within that I have a have a dataframe that has a schema with column names and types (integer, ...) for those columns. Now I use methods like flatMap but this returns a list of tuples that have no fixed type anymore. Is there a way to achieve that?
df.printSchema()
root
|-- name: string (nullable = true)
|-- ...
|-- ...
|-- ratings: integer (nullable = true)
Then I use flatMap to do some calculations with the rating values (obfuscated here):
df.flatMap(lambda row: (row.id, 5 if (row.ratings > 5) else row.ratings))
y_rate.toDF().printSchema()
And now I get an error:
TypeError: Can not infer schema for type:
Is there any way to use map/flatMap/reduce by keeping the schema? or at least returning tuples that have values of a specific type?