google-cloud-dataflow Questions
2
Solved
We are attempting to use fixed windows on an Apache Beam pipeline (using DirectRunner). Our flow is as follows:
Pull data from pub/sub
Deserialize JSON into Java object
Window events w/ fixed win...
Deploy asked 16/5, 2017 at 21:23
5
Solved
Some files get uploaded on a daily basis to an FTP server and I need those files under Google Cloud Storage. I don't want to bug the users that upload the files to install any additional software a...
Peele asked 19/4, 2017 at 4:36
5
Solved
I just need to run a dataflow pipeline on a daily basis, but it seems to me that suggested solutions like App Engine Cron Service, which requires building a whole web app, seems a bit too much.
I w...
Nasho asked 6/5, 2017 at 4:24
2
I am trying to run a apache beam pipeline in Dataflow runner; The job reads data from a bigquery table and write data to a database.
I am running the job with classic template option in dataflow - ...
Concepcion asked 7/5, 2021 at 20:29
6
Is there an example of a Python Dataflow Flex Template with more than one file where the script is importing other files included in the same folder?
My project structure is like this:
├── pipeline...
Myriammyriameter asked 18/11, 2020 at 14:52
2
I'm setting up a simple Proof of Concept to learn some of the concepts in Google Cloud, specifically PubSub and Dataflow.
I have a PubSub topic greeting
I've created a simple cloud function that ...
Position asked 16/5, 2019 at 16:20
2
Solved
Both DoFn and PTransform is a means to define operation for PCollection. How do we know which to use when?
Unmitigated asked 8/12, 2017 at 1:57
3
Solved
My use case involves fetching the job id of all streaming dataflow jobs present in my project and cancel it. Update the sources for my dataflow job and re-run it.
I am trying to achieve this using ...
Kremenchug asked 20/7, 2020 at 8:15
3
Solved
I would like to read a csv file and write it to BigQuery using apache beam dataflow. In order to do this I need to present the data to BigQuery in the form of a dictionary. How can I transform the ...
Glossematics asked 15/12, 2016 at 18:30
2
I'm using python beam on google dataflow, my pipeline looks like this:
Read image urls from file >> Download images >> Process images
The problem is that I can't let Download images step scale...
Yusuk asked 5/9, 2018 at 11:1
3
I have a bounded PCollection but i only want to get the first X amount of inputs and discard the rest. Is there a way to do this using Dataflow 2.X/ApacheBeam?
Extenuate asked 30/3, 2018 at 17:57
3
Solved
I'm currently new to using Apache Beam in Python with Dataflow runner. I'm interested in creating a batch pipeline that publishes to Google Cloud PubSub, I had tinkered with Beam Python APIs and fo...
Dias asked 24/4, 2019 at 5:21
4
I'd like to get some clarification on whether Cloud Dataflow or Cloud Composer is the right tool for the job, and I wasn't clear from the Google Documentation.
Currently, I'm using Cloud Dat...
Gunslinger asked 11/1, 2019 at 22:20
3
I am having some issue with one of Dataflow jobs. From time to time I get this error messages. It seems that after this errors, the job keeps running fine, but, this night it actually stuck, or it ...
Lastly asked 16/4, 2021 at 8:50
4
Solved
I'd like to specify a minimum number of workers for my job that autoscaling will not go below (akin to how it works for max_num_workers). Is this possible? My reason is that sometimes the worker st...
Overspill asked 14/8, 2018 at 20:0
3
Dataflow job is failing with below exception when I pass parameters staging,temp & output GCS bucket locations.
Java code:
final String[] used = Arrays.copyOf(args, args.length + 1);
used[us...
Cruet asked 10/5, 2018 at 6:32
2
Is there any guidance available to use Google Cloud SQL as a Dataflow read source and/or sink?
At the Apache Beam Python SDK 2.1.0 documentation there isn't a chapter mentioning Google Cloud SQL.
...
Valley asked 2/10, 2017 at 15:9
3
Solved
My company is evaluating if we can use Google Dataflow.
I have run a dataflow on Google Cloud Platform. The console shows 5 hr 25 minutes in "Reserved CPU Time" field on the right.
Worke...
Sudiesudnor asked 14/1, 2016 at 13:12
7
I am using Google Data Flow to implement an ETL data ware house solution.
Looking into google cloud offering, it seems DataProc can also do the same thing.
It also seems DataProc is little bit ...
Hildegaard asked 26/9, 2017 at 22:36
5
Solved
I am new to Beam and struggling to find many good guides and resources to learn best practices.
One thing I have noticed is there are two ways pipelines are defined:
with beam.Pipeline() as p:
# ...
Unarm asked 6/7, 2019 at 12:49
6
Someone know how to get Filename when using file pattern match in google-cloud-dataflow?
I'm newbee to use dataflow. How to get filename when use file patten match, in this way.
p.apply(TextIO.Re...
Zechariah asked 1/5, 2015 at 8:13
1
We're facing issues during Dataflow jobs deployment.
The error
We are using CustomCommands to install private repo on workers, but we face now an error in the worker-startup logs of our jobs:
Ru...
Masonite asked 6/1, 2020 at 16:17
1
My dataflow job has been failing since 7AM this morning with error:
Startup of the worker pool in zone europe-west3-c failed to bring up any of the desired 1 workers. ZONE_RESOURCE_POOL_EXHAUSTED:...
Johnnyjohnnycake asked 5/7, 2022 at 19:22
5
I am trying to follow this simple Dataflow example from google cloud site.
I have successfully installed the dataflow pipeline plugin and gcloud SDK (as well as Python 2.7). I have also set up a ...
Armillda asked 19/3, 2016 at 10:16
2
Solved
I have a requirement to trigger the Cloud Dataflow pipeline from Cloud Functions. But the Cloud function must be written in Java. So the Trigger for Cloud Function is Google Cloud Storage's Finalis...
Writein asked 21/8, 2020 at 5:24
1 Next >
© 2022 - 2024 — McMap. All rights reserved.