apache-beam - McMap

3

How to submit Beam Python job onto Kubernetes with Flink runner?

I'm wanting to run a continuous stream processing job using Beam on a Flink runner within Kubernetes. I've been following this tutorial here (https://python.plainenglish.io/apache-beam-flink-cluste...

python apache-flink apache-beam

Twayblade asked 18/7, 2023 at 12:8

2

Solved

Windowing with Apache Beam - Fixed Windows Don't Seem to be Closing?

We are attempting to use fixed windows on an Apache Beam pipeline (using DirectRunner). Our flow is as follows: Pull data from pub/sub Deserialize JSON into Java object Window events w/ fixed win...

java google-cloud-dataflow apache-beam

Deploy asked 16/5, 2017 at 21:23

2

AttributeError: 'RuntimeValueProvider' object has no attribute 'projectId'

I am trying to run a apache beam pipeline in Dataflow runner; The job reads data from a bigquery table and write data to a database. I am running the job with classic template option in dataflow - ...

google-cloud-dataflow apache-beam apache-beam-io

Concepcion asked 7/5, 2021 at 20:29

6

Including another file in Dataflow Python flex template, ImportError

Is there an example of a Python Dataflow Flex Template with more than one file where the script is importing other files included in the same folder? My project structure is like this: ├── pipeline...

python google-cloud-platform google-cloud-dataflow apache-beam

Myriammyriameter asked 18/11, 2020 at 14:52

2

Solved

Explain Apache Beam python syntax

I have read through the Beam documentation and also looked through Python documentation but haven't found a good explanation of the syntax being used in most of the example Apache Beam code. Can ...

python apache-beam

Cliffhanger asked 5/5, 2017 at 3:32

6

Solved

Collecting output from Apache Beam pipeline and displaying it to console

I have been working on Apache Beam for a couple of days. I wanted to quickly iterate on the application I am working and make sure the pipeline I am building is error free. In spark we can use sc.p...

apache-beam

Catamnesis asked 25/9, 2017 at 13:26

2

Solved

Way to visualize Beam pipeline run with DirectRunner

In GCP we can see the pipeline execution graph. Is the same possible when running locally via DirectRunner?

view pipeline apache-beam direct-runner

Roughhew asked 12/6, 2022 at 14:9

2

Solved

Apache Beam: DoFn vs PTransform

Both DoFn and PTransform is a means to define operation for PCollection. How do we know which to use when?

google-cloud-dataflow apache-beam

Unmitigated asked 8/12, 2017 at 1:57

3

Solved

How to convert csv into a dictionary in apache beam dataflow

I would like to read a csv file and write it to BigQuery using apache beam dataflow. In order to do this I need to present the data to BigQuery in the form of a dictionary. How can I transform the ...

python csv google-bigquery google-cloud-dataflow apache-beam

Glossematics asked 15/12, 2016 at 18:30

2

Throttling a step in beam application

I'm using python beam on google dataflow, my pipeline looks like this: Read image urls from file >> Download images >> Process images The problem is that I can't let Download images step scale...

python google-cloud-dataflow apache-beam dataflow

Yusuk asked 5/9, 2018 at 11:1

3

Dataflow/ApacheBeam Limit input to the first X amount?

I have a bounded PCollection but i only want to get the first X amount of inputs and discard the rest. Is there a way to do this using Dataflow 2.X/ApacheBeam?

java google-cloud-dataflow apache-beam

Extenuate asked 30/3, 2018 at 17:57

3

Solved

Why does custom Python object cannot be used with ParDo Fn?

I'm currently new to using Apache Beam in Python with Dataflow runner. I'm interested in creating a batch pipeline that publishes to Google Cloud PubSub, I had tinkered with Beam Python APIs and fo...

python google-cloud-dataflow apache-beam

Dias asked 24/4, 2019 at 5:21

4

Using Dataflow vs. Cloud Composer [closed]

I'd like to get some clarification on whether Cloud Dataflow or Cloud Composer is the right tool for the job, and I wasn't clear from the Google Documentation. Currently, I'm using Cloud Dat...

google-cloud-dataflow airflow apache-beam google-cloud-composer

Gunslinger asked 11/1, 2019 at 22:20

0

TypeError: isinstance() arg 2 must be a type or tuple of types" while using WriteToBigQuery in Apache Beam

I am trying to use Apache Beam with Python to fetch JSON data from an API and write it to a BigQuery table. Here is the code I am using: import argparse import json import requests import apache_be...

python apache-beam

Gang asked 24/4, 2023 at 10:54

3

Error message from worker: generic::aborted: SDK harness sdk-0-1 disconnected

I am having some issue with one of Dataflow jobs. From time to time I get this error messages. It seems that after this errors, the job keeps running fine, but, this night it actually stuck, or it ...

google-cloud-dataflow apache-beam

Lastly asked 16/4, 2021 at 8:50

3

How to read large CSV with Beam?

I'm trying to figure out how to use Apache Beam to read large CSV files. By "large" I mean, several gigabytes (so that it would be impractical to read the entire CSV into memory at once). So far, ...

apache-beam

Versify asked 20/7, 2018 at 9:17

5

Solved

Correct way to define an apache beam pipepline

I am new to Beam and struggling to find many good guides and resources to learn best practices. One thing I have noticed is there are two ways pipelines are defined: with beam.Pipeline() as p: # ...

python google-cloud-dataflow apache-beam

Unarm asked 6/7, 2019 at 12:49

3

Solved

What are the benefits of Apache Beam over Spark/Flink for batch processing?

Apache Beam supports multiple runner backends, including Apache Spark and Flink. I'm familiar with Spark/Flink and I'm trying to see the pros/cons of Beam for batch processing. Looking at the Beam...

apache-spark apache-flink apache-beam

Orontes asked 24/4, 2017 at 6:26

3

What are schemas for in Apache Beam?

I was reading the docs about SCHEMAS in Apache BEAM but i can not understand what its purpose is, how and why or in which cases should i need to use them. What is the difference between using schem...

apache-beam dataflow

Mai asked 16/6, 2020 at 16:13

1

How to install private repository on Dataflow Worker?

We're facing issues during Dataflow jobs deployment. The error We are using CustomCommands to install private repo on workers, but we face now an error in the worker-startup logs of our jobs: Ru...

python google-cloud-dataflow apache-beam

Masonite asked 6/1, 2020 at 16:17

2

Solved

Dockerized Apache Beam returns "No id provided"

I've hit a problem with dockerized Apache Beam. When trying to run the container I am getting "No id provided." message and nothing more. Here's the code and files: Dockerfile FROM apache...

docker apache apache-beam python-3.8

Calvillo asked 15/9, 2021 at 15:14

4

Does Apache Beam support custom file names for its output?

While in a distributed processing environment it is common to use "part" file names such as "part-000", is it possible to write an extension of some sort to rename the individual output file names ...

google-cloud-dataflow apache-beam

Luxuriate asked 9/10, 2017 at 3:29

1

Error while running beam streaming pipeline (Python) with pub/sub io in embedded Flinkrunner (apache_beam [GCP])

I am facing the following error while running a streaming pipeline (python) in Apache Beam on Flinkrunner. The pipeline contains a GCP pub/sub io source and pub/sub target. WARNING:root:Make sure t...

python apache-beam flink-streaming google-cloud-pubsub

Nikolas asked 12/7, 2021 at 4:58

1

How to send and filter structured logs from dataflow job

The goal is to store audit logging from different apps/jobs and be able to aggregate them by some ids. We chose to have BigQuery for that purpose and so we need to get a structured information from...

google-cloud-dataflow logback apache-beam google-cloud-stackdriver

Mulvey asked 18/4, 2019 at 7:51

6

Permissions error with Apache Beam example on Google Dataflow

I'm having trouble submitting an Apache Beam example from a local machine to our cloud platform. Using gcloud auth list I can see that the correct account is currently active. I can use gsutil and...

python google-cloud-platform google-cloud-dataflow apache-beam

Bicentenary asked 25/5, 2017 at 14:32

apache-beam Questions

Recommended topics

Hot tags