PySpark - Resolving isnan errors with TimeStamp datatype

I'm trying to create a function to check the quality of data (nans/nulls etc) I have the following code running on a PySpark DataFrame

df.select([f.count(f.when((f.isnan(c) | f.col(c).isNull()), c)).alias(c) for c in cols_check]).show()

As long as the columns to check are strings/integers, I have no issue. However when I check columns with the datatype of date or timestamp I receive the following error:

cannot resolve 'isnan(Date_Time)' due to data type mismatch: argument 1 requires (double or float) type, however, 'Date_Time' is of timestamp type.;;\n'Aggregate...

There are clear null values in the column, how can I remedy this?

from pyspark.sql import functions as F df.select(*[ ( F.count(F.when((F.isnan(c) | F.col(c).isNull()), c)) if t not in ("timestamp", "date") else F.count(F.when(F.col(c).isNull(), c)) ).alias(c) for c, t in df.dtypes if c in cols_check ]).show()

Recommended topics

Hot tags