I am trying to follow this simple Dataflow example from google cloud site.
I have successfully installed the dataflow pipeline plugin and gcloud SDK (as well as Python 2.7). I have also set up a project on google cloud and enabled billing and all the necessary API's - as specified in the instructions above.
However, when I go to the run configurations and change the Pipeline Arguments tab to select BlockingDataflowPipelineRunner, after entering creating a bucket and setting my project-id, hitting run gives me:
Caused by: java.lang.IllegalArgumentException: Output path does not exist or is not writeable: gs://my-cloud-dataflow-bucket
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.verifyPathIsAccessible(DataflowPathValidator.java:79)
at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.validateOutputFilePrefixSupported(DataflowPathValidator.java:62)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.fromOptions(DataflowPipelineRunner.java:255)
at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.fromOptions(BlockingDataflowPipelineRunner.java:82)
... 9 more
I have used my terminal to execute 'gcloud auth login' and I see in the browser that I am successfully logged in.
I am really not sure what I have done wrong here. Can anyone confirm if this is a known issue with using dataflow pipeline and google buckets?
Thanks!
gsutil ls gs://my-cloud-dataflow-bucket
on the command-line? (I'll give a generic answer first, and the follow up with a more specific one once we figure out the root-cause.) – Cabalist