I am trying to run a apache beam pipeline in Dataflow runner; The job reads data from a bigquery table and write data to a database.
I am running the job with classic template option in dataflow - means first I will have to stage the pipeline and then run it with appropriate argument.
My pipeline options is as below
options = PipelineOptions()
options.view_as(SetupOptions).save_main_session = True
importer_options = options.view_as(ImporterOptions)
google_options = options.view_as(GoogleCloudOptions)
with beam.Pipeline(options=options) as p:
p | 'BigQuery Read' >> beam.io.ReadFromBigQuery(
table=importer_options.input_table)
The ImportOptions is currently accepting the input_table as an argument.
parser.add_value_provider_argument('--input-table',
help='The bigquery input table in the format dataset.table_name')
But running the pipeline throws me an error like below
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/bigquery.py", line 791, in split if not self.table_reference.projectId: AttributeError: 'RuntimeValueProvider' object has no attribute 'projectId'
Anyone has any idea what am I missing here.
I am building the template using the below command.
python -m main
--runner DataflowRunner
--project test-project
--region=europe-west1
--staging_location gs://test/staging_python
--temp_location gs://test/test
--template_location gs://test/templates_python/test \
Note - I tried running the pipeline by providing the fully qualified table name against the input_table (means, including the project id), but that didn't help either.