I have a pipeline that I can execute locally without any errors. I used to get this error in my locally run pipeline
'Clients have non-trivial state that is local and unpickleable.'
PicklingError: Pickling client objects is explicitly not supported.
I believe I fixed this by downgrading to apache-beam=2.3.0 Then locally it would run perfectly.
Now I am using DataflowRunner and in the requirements.txt file I have the following dependencies
apache-beam==2.3.0
google-cloud-bigquery==1.1.0
google-cloud-core==0.28.1
google-cloud-datastore==1.6.0
google-cloud-storage==1.10.0
protobuf==3.5.2.post1
pytz==2013.7
but I get this dreaded error again
'Clients have non-trivial state that is local and unpickleable.'
PicklingError: Pickling client objects is explicitly not supported.
How come it's giving me the error with DataflowRunner but not DirectRunner? shouldn't they be using the same dependencies/environment? Any help would be appreciated.
I had read that this is the way to solve it but when I try it I still get the same error
class MyDoFn(beam.DoFn):
def start_bundle(self, process_context):
self._dsclient = datastore.Client()
def process(self, context, *args, **kwargs):
# do stuff with self._dsclient
from https://github.com/GoogleCloudPlatform/google-cloud-python/issues/3191
My previous reference post where I fixed this locally:
Using start_bundle() in apache-beam job not working. Unpickleable storage.Client()
Thanks in advance!