Missing object or bucket in path when running on Dataflow
Asked Answered
A

2

8

When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline runs, where the root cause is that I'm missing something in the path.

Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions) ... Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://df-staging-bucket-57763/', did you mean: 'gs://some-bucket/df-staging-bucket-57763'?

gs://df-staging-bucket-57763/ already exists in my project, and I have access to it. What do I need to add to make this work?

Alary answered 31/5, 2017 at 17:36 Comment(0)
A
15

The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging or --tempLocation=gs://df-staging-bucket-57763/temp) to your arguments (for each of the stagingLocation and gcpTempLocation arguments) will be sufficient to run the pipeline.

Alary answered 31/5, 2017 at 17:36 Comment(0)
M
-1

Update run configuration as below:

  1. uncheck flag "Use Default Dataflow options" under the Pipeline Arguments tab. Select pipeline arguments manually.
  2. Keep blank value for "Cloud Storage staging location".
Minium answered 25/9, 2019 at 15:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.