apache-beam Questions
3
I'm wanting to run a continuous stream processing job using Beam on a Flink runner within Kubernetes. I've been following this tutorial here (https://python.plainenglish.io/apache-beam-flink-cluste...
Twayblade asked 18/7, 2023 at 12:8
2
Solved
We are attempting to use fixed windows on an Apache Beam pipeline (using DirectRunner). Our flow is as follows:
Pull data from pub/sub
Deserialize JSON into Java object
Window events w/ fixed win...
Deploy asked 16/5, 2017 at 21:23
2
I am trying to run a apache beam pipeline in Dataflow runner; The job reads data from a bigquery table and write data to a database.
I am running the job with classic template option in dataflow - ...
Concepcion asked 7/5, 2021 at 20:29
6
Is there an example of a Python Dataflow Flex Template with more than one file where the script is importing other files included in the same folder?
My project structure is like this:
├── pipeline...
Myriammyriameter asked 18/11, 2020 at 14:52
2
Solved
I have read through the Beam documentation and also looked through Python documentation but haven't found a good explanation of the syntax being used in most of the example Apache Beam code.
Can ...
Cliffhanger asked 5/5, 2017 at 3:32
6
Solved
I have been working on Apache Beam for a couple of days. I wanted to quickly iterate on the application I am working and make sure the pipeline I am building is error free. In spark we can use sc.p...
Catamnesis asked 25/9, 2017 at 13:26
2
Solved
In GCP we can see the pipeline execution graph. Is the same possible when running locally via DirectRunner?
Roughhew asked 12/6, 2022 at 14:9
2
Solved
Both DoFn and PTransform is a means to define operation for PCollection. How do we know which to use when?
Unmitigated asked 8/12, 2017 at 1:57
3
Solved
I would like to read a csv file and write it to BigQuery using apache beam dataflow. In order to do this I need to present the data to BigQuery in the form of a dictionary. How can I transform the ...
Glossematics asked 15/12, 2016 at 18:30
2
I'm using python beam on google dataflow, my pipeline looks like this:
Read image urls from file >> Download images >> Process images
The problem is that I can't let Download images step scale...
Yusuk asked 5/9, 2018 at 11:1
3
I have a bounded PCollection but i only want to get the first X amount of inputs and discard the rest. Is there a way to do this using Dataflow 2.X/ApacheBeam?
Extenuate asked 30/3, 2018 at 17:57
3
Solved
I'm currently new to using Apache Beam in Python with Dataflow runner. I'm interested in creating a batch pipeline that publishes to Google Cloud PubSub, I had tinkered with Beam Python APIs and fo...
Dias asked 24/4, 2019 at 5:21
4
I'd like to get some clarification on whether Cloud Dataflow or Cloud Composer is the right tool for the job, and I wasn't clear from the Google Documentation.
Currently, I'm using Cloud Dat...
Gunslinger asked 11/1, 2019 at 22:20
0
I am trying to use Apache Beam with Python to fetch JSON data from an API and write it to a BigQuery table. Here is the code I am using:
import argparse
import json
import requests
import apache_be...
Gang asked 24/4, 2023 at 10:54
3
I am having some issue with one of Dataflow jobs. From time to time I get this error messages. It seems that after this errors, the job keeps running fine, but, this night it actually stuck, or it ...
Lastly asked 16/4, 2021 at 8:50
3
I'm trying to figure out how to use Apache Beam to read large CSV files. By "large" I mean, several gigabytes (so that it would be impractical to read the entire CSV into memory at once).
So far, ...
Versify asked 20/7, 2018 at 9:17
5
Solved
I am new to Beam and struggling to find many good guides and resources to learn best practices.
One thing I have noticed is there are two ways pipelines are defined:
with beam.Pipeline() as p:
# ...
Unarm asked 6/7, 2019 at 12:49
3
Solved
Apache Beam supports multiple runner backends, including Apache Spark and Flink. I'm familiar with Spark/Flink and I'm trying to see the pros/cons of Beam for batch processing.
Looking at the Beam...
Orontes asked 24/4, 2017 at 6:26
3
I was reading the docs about SCHEMAS in Apache BEAM but i can not understand what its purpose is, how and why or in which cases should i need to use them. What is the difference between using schem...
Mai asked 16/6, 2020 at 16:13
1
We're facing issues during Dataflow jobs deployment.
The error
We are using CustomCommands to install private repo on workers, but we face now an error in the worker-startup logs of our jobs:
Ru...
Masonite asked 6/1, 2020 at 16:17
2
Solved
I've hit a problem with dockerized Apache Beam. When trying to run the container I am getting "No id provided." message and nothing more. Here's the code and files:
Dockerfile
FROM apache...
Calvillo asked 15/9, 2021 at 15:14
4
While in a distributed processing environment it is common to use "part" file names such as "part-000", is it possible to write an extension of some sort to rename the individual output file names ...
Luxuriate asked 9/10, 2017 at 3:29
1
I am facing the following error while running a streaming pipeline (python) in Apache Beam on Flinkrunner. The pipeline contains a GCP pub/sub io source and pub/sub target.
WARNING:root:Make sure t...
Nikolas asked 12/7, 2021 at 4:58
1
The goal is to store audit logging from different apps/jobs and be able to aggregate them by some ids. We chose to have BigQuery for that purpose and so we need to get a structured information from...
Mulvey asked 18/4, 2019 at 7:51
6
I'm having trouble submitting an Apache Beam example from a local machine to our cloud platform.
Using gcloud auth list I can see that the correct account is currently active. I can use gsutil and...
Bicentenary asked 25/5, 2017 at 14:32
1 Next >
© 2022 - 2024 — McMap. All rights reserved.