I'm trying to create a function to check the quality of data (nans/nulls etc) I have the following code running on a PySpark DataFrame
df.select([f.count(f.when((f.isnan(c) | f.col(c).isNull()), c)).alias(c) for c in cols_check]).show()
As long as the columns to check are strings/integers, I have no issue. However when I check columns with the datatype of date
or timestamp
I receive the following error:
cannot resolve 'isnan(
Date_Time
)' due to data type mismatch: argument 1 requires (double or float) type, however, 'Date_Time
' is of timestamp type.;;\n'Aggregate...
There are clear null values in the column, how can I remedy this?