I'm getting this error
Can't pickle <class 'google.protobuf.pyext._message.CMessage'>: it's not found as google.protobuf.pyext._message.CMessage
when I try to create a UDF in PySpark. Apparently, it uses CloudPickle to serialize the command however, I'm aware that protobuf messages contains C++
implementations, meaning it cannot be pickled.
I've tried finding a way to perhaps override CloudPickleSerializer
, however, I wasn't able to find a way.
Here's my example code:
from MyProject.Proto import MyProtoMessage
from google.protobuf.json_format import MessageToJson
import pyspark.sql.functions as F
def proto_deserialize(body):
msg = MyProtoMessage()
msg.ParseFromString(body)
return MessageToJson(msg)
from_proto = F.udf(lambda s: proto_deserialize(s))
base.withColumn("content", from_proto(F.col("body")))
Thanks in advance.