Google Cloud Storage: Output path does not exist or is not writeable
Asked Answered
A

5

8

I am trying to follow this simple Dataflow example from google cloud site.

I have successfully installed the dataflow pipeline plugin and gcloud SDK (as well as Python 2.7). I have also set up a project on google cloud and enabled billing and all the necessary API's - as specified in the instructions above.

However, when I go to the run configurations and change the Pipeline Arguments tab to select BlockingDataflowPipelineRunner, after entering creating a bucket and setting my project-id, hitting run gives me:

Caused by: java.lang.IllegalArgumentException: Output path does not exist or is not writeable: gs://my-cloud-dataflow-bucket
    at com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
    at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.verifyPathIsAccessible(DataflowPathValidator.java:79)
    at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.validateOutputFilePrefixSupported(DataflowPathValidator.java:62)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.fromOptions(DataflowPipelineRunner.java:255)
    at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.fromOptions(BlockingDataflowPipelineRunner.java:82)
    ... 9 more

I have used my terminal to execute 'gcloud auth login' and I see in the browser that I am successfully logged in.

I am really not sure what I have done wrong here. Can anyone confirm if this is a known issue with using dataflow pipeline and google buckets?

Thanks!

Armillda answered 19/3, 2016 at 10:16 Comment(1)
Can you try running gsutil ls gs://my-cloud-dataflow-bucket on the command-line? (I'll give a generic answer first, and the follow up with a more specific one once we figure out the root-cause.)Cabalist
C
7

I had a similar issue with GCS bucket permissions, though I certainly had write permissions and I could upload files into the bucket. What solved the problem for me was acquiring roles/dataflow.admin permission for the project I was submitting the pipeline to.

Chamberlin answered 17/1, 2019 at 1:4 Comment(0)
C
3

When submitting pipelines to the Google Cloud Dataflow Service, the pipeline runner on your local machine uploads files, which are necessary for execution in the cloud, to a "staging location" in Google Cloud Storage.

The pipeline runner on your local machine seems to be unable to write the required files to the staging location provided (gs://my-cloud-dataflow-bucket). It could be that the location doesn't exist, or that it belongs to a different GCP project than you authenticated against, or that there are more specific permissions set on that bucket, etc.

You can start debugging the issue via gsutil command-line too. For example, try running gsutil ls gs://my-cloud-dataflow-bucket to attempt to list the contents of the bucket. Then, try to upload via gsutil cp command. This will perhaps produce enough information to root-cause the issue you are facing.

Cabalist answered 19/3, 2016 at 17:58 Comment(7)
I ran the following commands in my terminal: Your current project is [rosh-test]. You can change this setting by running: $ gcloud config set project PROJECT_ID Roshs-MacBook-Air:~ RoshPlaha$ gsutil ls gs://my-cloud-dataflow-bucket AccessDeniedException: 403 Forbidden I should point out, in the eclipse dataflow plugin, when creating the project, I specified the name of the bucket and then clicked 'create'. Eclipse told me the creation of the bucket was successful. However, when I check on gcp to see if the bucket exists, it says it doesn't.Armillda
Furthermore, when I try to manually create the same bucket - it says that I cant have two buckets with the same name! Whilst in gcp, I started up gsutil and ran: gsutil acl ch -u [email protected]:W gs://my-cloud-dataflow-bucket. However that also gives a 403 forbidden error.Armillda
A few things to check: make sure your account is at least an Editor on the project, don't forget to run gcloud auth login. Also, when creating the bucket, make sure the project name is specified. If this fails, I suggest creating the bucket manually in the Developers Console and just using it in Eclipse.Cabalist
Hey Davor. In the gcp storage section ui, I changed both the bucket permissions and default bucket permissions so that owners, editors and viewers all have 'owner' permissions set. I also added a new entry for my specific email address. Via the terminal I executed: gsutil cp somefile.txt gs://my-cloud-dataflow-bucket. I saw that the file was uploaded - so the permissions seem ok. However, when I run my eclipse program, I still get the error: the bucket does not exist or is not writeable :(Armillda
You seem to have made progress on the issue. Before you were getting 403 for all actions; now you seem to be successfully copying files --> definite sign of progress. Could it be that your Eclipse environment somehow cannot access your home directory and/or command-line environment variables?Cabalist
How did you solve the problem ? I am having the same issue, I can gsutil cp on CLI but have the error on the Java code side. Thanks.Jan
Anyone found solution on this issue, i am also seeing this error.Rathskeller
A
2

Try to provide zone parameter, it works in my case with similar error. And of course export GOOGLE_APPLICATION_CREDENTIALS environment variable before running your app.

 ...
 -Dexec.args="--runner=DataflowRunner \
 --gcpTempLocation=gs://bucket/tmp \
 --zone=bucket-zone \
 ...
Accelerator answered 10/3, 2020 at 13:25 Comment(0)
T
0

Got the same error. Fixed it by setting GOOGLE_APPLICATION_CREDENTIALS using the key file with write permissions in ~/.bash_profile on Mac.

Trumaine answered 16/6, 2022 at 19:38 Comment(1)
This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From ReviewStryker
A
-5

I realised I needed to use a specific acl command via gsutil. Setting my account to have owner permissions did not do the job. Instead using:

gsutil acl set public-read-write gs://my-bucket-name-here

worked in this case. Hope this helps someone!

Armillda answered 20/3, 2016 at 15:18 Comment(1)
We should not encourage users to set public-read-write on their buckets. This is not necessary. Editors of the project needs to have a write access, as well as the service accounts. Then, you need to authenticate as one of the editors, and that should be enough.Cabalist

© 2022 - 2025 — McMap. All rights reserved.