Ways of using value provider parameter in Python Apache Beam
Asked Answered
D

1

1

Right now I'm just able to grab the RunTime value inside a class using a ParDo, is there another way to get to use the runtime parameter like in my functions?

This is the code I got right now:

class UserOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument('--firestore_document',default='')

def run(argv=None):

    parser = argparse.ArgumentParser()

    pipeline_options = PipelineOptions()

    user_options = pipeline_options.view_as(UserOptions)

    pipeline_options.view_as(SetupOptions).save_main_session = True

    with beam.Pipeline(options=pipeline_options) as p:

        rows = (p 
        | 'Create inputs' >> beam.Create(['']) 
        | 'Call Firestore' >> beam.ParDo(
                CallFirestore(user_options.firestore_document)) 
        | 'Read DB2' >> beam.Map(ReadDB2))

I want the user_options.firestore_document to be usable in other functions without having to do a ParDo

Darg answered 4/10, 2019 at 16:21 Comment(10)
Hola! What do you mean by RunTime value? Do you mean this user_options.firestore_document variable?Chatty
What do you mean by user_options.firestore_document to be usable in other functions without having to do a ParDo ?Crucify
@Chatty Hola! Yes, that is my RunTime value, that value will only be filled when the template is executed and not when the template is build, right now if I try to read that value inside any function that isn't a ParDo function I will get errors about it being emptyDarg
@guillaumeblaquiere What I mean is that if I try to use that variable in any function, which is only passed when executing the template and not when creating it, the template won't generate and I will start getting errors about my variable being emptyDarg
What other functions are you thinking of? Can you give an example of these other functions? This should work for any pardo - so in your code, it should work fine.Chatty
@Chatty Yes it works inside the ParDo, but for example if we want to use it inside a Create how can we do it? Is it possible to use a RunTimeValue Provider outside a ParDo? What I am trying to achieve is to have one template and from the runtime parameters read from one table or another, and then write to one table or another.Darg
It can only be used from a ParDo - not from Create, but a ParDo should be enough for what you need. Are you trying to use it within CallFirestore?Chatty
@Chatty Oh so those can only be used inside a ParDo :( I was able to achieve what I was trying to do with the ParDo, but thought there was another way to do it, Thanks!Darg
I'll add an answer to see if it helps others.Chatty
This is my exact point. I would like to use the value of the argument in functions inside dataflow which isn't a ParDo.Did you get any luck on this?Nils
C
6

The only way in which you can use value providers are in ParDos, and Combines. It is not possible to pass a value provider in a create, but you can define a DoFn that returns the value provider you pass to it in the constructor:

class OutputValueProviderFn(beam.DoFn):
  def __init__(self, vp):
    self.vp = vp

  def process(self, unused_elm):
    yield self.vp.get()

And in your pipeline, you would do the following:

user_options = pipeline_options.view_as(UserOptions)

with beam.Pipeline(options=pipeline_options) as p:
  my_value_provided_pcoll = (
      p
      | beam.Create([None])
      | beam.ParDo(OutputValueProviderFn(user_options.firestore_document))

That way you wouldn't use it in a Create, as it's not possible, but you could still get it in a PCollection.

Chatty answered 10/10, 2019 at 16:58 Comment(1)
Hey @pablo, that was really helpful to me today, thanks!. But where was I supposed to find that information? Is it in the docs? I tried to find it but couldn't. Any recommendations for sources?Beggs

© 2022 - 2024 — McMap. All rights reserved.