google-cloud-dataflow Questions
5
I am designing a solution in which Google Cloud SQL will be used to store all data from the regular functioning of the app(kind of OLTP data). The data is expected to grow over time into pretty lar...
Eadmund asked 22/9, 2017 at 17:13
4
While in a distributed processing environment it is common to use "part" file names such as "part-000", is it possible to write an extension of some sort to rename the individual output file names ...
Luxuriate asked 9/10, 2017 at 3:29
5
Solved
I need to generate a SQL string using Azure data flow expression builder, but it won't allow me to add a single quote between my string using Concat function
I need to have a SQL string as below
SE...
Zondra asked 12/11, 2019 at 11:18
1
The goal is to store audit logging from different apps/jobs and be able to aggregate them by some ids. We chose to have BigQuery for that purpose and so we need to get a structured information from...
Mulvey asked 18/4, 2019 at 7:51
6
I'm having trouble submitting an Apache Beam example from a local machine to our cloud platform.
Using gcloud auth list I can see that the correct account is currently active. I can use gsutil and...
Bicentenary asked 25/5, 2017 at 14:32
2
Solved
I am using terraform resource google_dataflow_flex_template_job to deploy a Dataflow flex template job.
resource "google_dataflow_flex_template_job" "streaming_beam" {
provider...
Inestimable asked 17/1, 2021 at 21:19
3
Solved
I am currently working on a ETL Dataflow job (using the Apache Beam Python SDK) which queries data from CloudSQL (with psycopg2 and a custom ParDo) and writes it to BigQuery. My goal is to create a...
Leaseholder asked 5/6, 2018 at 13:46
1
Solved
I am following this tutorial on migrating data from an oracle database to a Cloud SQL PostreSQL instance.
I am using the Google Provided Streaming Template Datastream to PostgreSQL
At a high level ...
Euthanasia asked 13/1, 2022 at 21:24
3
I was testing my Dataflow pipeline using DirectRunner from my Mac and got lots of "WARNING" message like this, may I know how to get rid of them because it is too much that I can not even see my de...
Jair asked 5/4, 2018 at 21:27
1
Solved
I'm currently building PoC Apache Beam pipeline in GCP Dataflow. In this case, I want to create streaming pipeline with main input from PubSub and side input from BigQuery and store processed data ...
Rescript asked 3/1, 2022 at 4:48
2
I have run the below code for 522 gzip files of size 100 GB and after decompressing, it will be around 320 GB data and data in protobuf format and write the output to GCS. I have used n1 standard m...
Colbert asked 9/1, 2021 at 12:39
1
I am using the Go SDK with Apache Beam to build a simple Dataflow pipeline that will get data from a query and publish the data to pub/sub with the following code:
package main
import (
"con...
Cornstarch asked 20/10, 2021 at 19:0
2
Solved
I'm trying to run my python dataflow job with flex template. job works fine locally when I run with direct runner (without flex template) however when I try to run it with flex template, job stuck ...
Kippy asked 13/11, 2020 at 0:14
1
Solved
I am writing a Splittable DoFn to read a MongoDB change stream.
It allows me to observe events describing changes to a collection, and I can start reading at an arbitrary cluster timestamp I want, ...
Burkhart asked 27/9, 2021 at 9:40
1
Need to convert these 2 gcloud commands to build and run dataflow jobs using Terraform.
gcloud dataflow flex-template build ${TEMPLATE_PATH} \
--image-gcr-path "${TARGET_GCR_IMAGE}" \
...
Concupiscent asked 27/9, 2021 at 14:17
3
I am running a streaming Apache beam pipeline in Google Dataflow. It's reading data from Kafka and streaming insert to Bigquery.
But in the bigquery streaming insert step it's throwing a large numb...
Trodden asked 1/6, 2021 at 8:58
4
Solved
I try to set up controller service account for Dataflow. In my dataflow options I have:
options.setGcpCredential(GoogleCredentials.fromStream(
new FileInputStream("key.json")).createScop...
Levigate asked 12/12, 2018 at 9:7
2
Before seeing:
RuntimeError: IOError: [Errno 2] No such file or directory:
'/beam-temp-andrew_mini_vocab-..../......andrew_mini_vocab' [while running .....]
in my apache beam python dataflow job...
Hombre asked 12/12, 2017 at 18:0
1
I'm testing some pipeline on a small set of data and then suddenly my pipeline breaks down during one of the test runs with this message: Not found: Dataset thijs-dev:nlthijs_ba was not found in lo...
Fatuity asked 16/2, 2020 at 10:22
1
While working to adapt Java's KafkaIOIT to work with a large dataset I encountered a problem. I want to push 100M records through a Kafka topic, verify data correctness and at the same time check t...
Portion asked 12/9, 2019 at 7:26
2
I am using zsh, and I have installed gcloud in order to interact with GCP via local terminal on my Mac. I am encountering this error “zsh: no matches found: apache-beam[gcp]”. However, when I run t...
Scaffold asked 11/3, 2020 at 14:21
2
Solved
I've created a standard PubSub to BigQuery dataflow. However, in order to ensure I wasn't going to run up a huge bill while offline, I cancelled the dataflow. From the GCP console, there doesn't se...
Terrilynterrine asked 3/1, 2018 at 18:3
2
My Apache beam pipeline implements custom Transforms and ParDo's python modules which further imports other modules written by me. On Local runner this works fine as all the available files are ava...
Tonneau asked 10/7, 2018 at 9:45
3
Solved
I'm currently trying to use Dataflow with Pub/Sub but I'm getting this error:
Workflow failed. Causes: (6e74e8516c0638ca): There was a problem refreshing your credentials. Please check:
1. Dataflo...
Crumb asked 2/5, 2017 at 16:25
1
Solved
I want to publish messages to a Pub/Sub topic with some attributes thanks to Dataflow Job in batch mode.
My dataflow pipeline is write with python 3.8 and apache-beam 2.27.0
It works with the @Anku...
Bibliotaph asked 26/3, 2021 at 17:21
© 2022 - 2025 — McMap. All rights reserved.