apache-beam - 4

3

Solved

I want to understand in which scenario that I should use FlatMap or Map. The documentation did not seem clear to me. I still do not understand in which scenario I should use the transformation of ...

google-cloud-dataflow apache-beam

Poult asked 14/8, 2017 at 9:1

0

How can I write streaming Dataflow pipelines that support schema evolution?

I'm building some data streaming pipelines that read from Kafka and write to various sinks using Google Cloud Dataflow. The pipeline looks something like this (simplified). // Example pipeline tha...

kotlin google-cloud-platform google-cloud-dataflow apache-beam

Pentalpha asked 2/3, 2020 at 20:32

2

Google DataFlow/Python: Import errors with save_main_session and custom modules in __main__

Could somebody please clarify the expected behavior when using save_main_session and custom modules imported in __main__. My DataFlow pipeline imports 2 non-standard modules - one via requirements....

python google-cloud-dataflow apache-beam

Lobo asked 12/7, 2018 at 17:21

1

Solved

Is there anyway I can use preemptible instance for dataflow jobs?

It's evident that preemptible instance are cheaper than non-preemptible instance. On daily basis 400-500 dataflow jobs are running in my organisational project. Out of which some jobs are time-sens...

google-cloud-platform google-compute-engine google-cloud-dataflow apache-beam

Efficiency asked 9/2, 2020 at 15:7

1

Solved

Feeding nullable data from BigQuery into Tensorflow Transform

We're trying to build a pipeline that takes data from BigQuery, runs through TensorFlow Transform, before training in TensorFlow. The pipeline is up and running, but we're having difficulty with n...

python tensorflow google-bigquery apache-beam tensorflow-transform

Ikeda asked 22/1, 2020 at 16:50

1

Solved

GCP Dataflow Apache Beam code logic not working as expected

I am trying to implement a CDC in Apache Beam, deployed in Google Cloud Dataflow. I have unloaded the master data and the new data, which is expected to coming daily. The join is not working as e...

python google-cloud-platform google-cloud-dataflow apache-beam

Mytilene asked 20/12, 2019 at 14:26

2

Solved

Writing to text files in Apache Beam / Dataflow Python streaming

I have a very basic Python Dataflow job that reads some data from Pub/Sub, applies a FixedWindow and writes to Google Cloud Storage. transformed = ... transformed | beam.io.WriteToText(known_args....

google-cloud-storage google-cloud-dataflow apache-beam google-cloud-pubsub

Arachnoid asked 19/11, 2018 at 17:10

2

Solved

AttributeError: 'module' object has no attribute 'ensure_str'

I try to transfer data from one bigquery to anther through Beam, however, the following error comes up: WARNING:root:Retry with exponential backoff: waiting for 4.12307941111 seconds before retryi...

python google-cloud-dataflow apache-beam

Helle asked 29/7, 2019 at 10:1

1

Solved

Apache Beam Coder for org.json.JSONObject

I am writing a data pipeline in Apache Beam that reads from Pub/Sub, deserializes the message into JSONObjects and pass them to some other pipeline stages. The issue is, when I try to submit my cod...

google-cloud-dataflow apache-beam

Seraphina asked 10/12, 2019 at 22:12

1

Solved

How can I write to Big Query using a runtime value provider in Apache Beam?

EDIT: I got this to work using beam.io.WriteToBigQuery with the sink experimental option turned on. I actually had it on but my issue was I was trying to "build" the full table reference from two v...

python google-cloud-platform runtime google-cloud-dataflow apache-beam

Broadbent asked 6/12, 2019 at 17:37

1

Solved

Monitoring WriteToBigQuery

In my pipeline I use WriteToBigQuery something like this: | beam.io.WriteToBigQuery( 'thijs:thijsset.thijstable', schema=table_schema, write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND...

python-3.x google-bigquery google-cloud-dataflow apache-beam

Dielu asked 29/11, 2019 at 9:55

2

Solved

Issues with Dynamic Destinations in Dataflow

I have a Dataflow job that reads data from pubsub and based on the time and filename writes the contents to GCS where the folder path is based on the YYYY/MM/DD. This allows files to be generated i...

java google-cloud-storage google-cloud-dataflow apache-beam

Yoruba asked 18/4, 2019 at 14:59

2

Solved

external api call in apache beam dataflow

I have an use case where, I read in the newline json elements stored in google cloud storage and start processing each json. While processing each json, I have to call an external API for doing de-...

java google-cloud-dataflow apache-beam apache-beam-io

Twenty asked 17/11, 2019 at 17:28

1

Apache Beam : RabbitMqIO watermark doesn't advance

I need some help please. I'm trying to use Apache beam with RabbitMqIO source (version 2.11.0) and AfterWatermark.pastEndOfWindow trigger. It seems like the RabbitMqIO's watermark doesn't advance a...

rabbitmq apache-beam

Cassiecassil asked 17/4, 2019 at 22:4

1

Solved

Ways of using value provider parameter in Python Apache Beam

Right now I'm just able to grab the RunTime value inside a class using a ParDo, is there another way to get to use the runtime parameter like in my functions? This is the code I got right now: c...

python parameters google-cloud-platform google-cloud-dataflow apache-beam

Darg asked 4/10, 2019 at 16:21

3

Solved

How to get apache beam for dataflow GCP on Python 3.x

I'm very newby with GCP and dataflow. However , I would like to start to test and deploy few flows harnessing dataflow on GCP. According to the documentation and everything around dataflow is imper...

python google-cloud-platform google-cloud-dataflow apache-beam dataflow

Lobel asked 24/1, 2019 at 4:15

1

Solved

what actually manages watermarks in beam?

Beam's big power comes from it's advanced windowing capabilities, but it's also a bit confusing. Having seen some oddities in local tests (I use rabbitmq for an input Source) where messages were n...

google-cloud-dataflow apache-beam

Allonym asked 3/10, 2019 at 14:32

2

Apache Beam Coder for GenericRecord

I am building a pipeline that reads Avro generic records. To pass GenericRecord between stages I need to register AvroCoder. The documentation says that if I use generic record, the schema argument...

google-cloud-dataflow avro apache-beam

Dotson asked 13/12, 2018 at 14:53

1

Solved

How can I debug why my Dataflow job is stuck?

I have a Dataflow job that is not making progress - or it is making very slow progress, and I do not know why. How can I start looking into why the job is slow / stuck?

google-cloud-dataflow apache-beam

Sural asked 26/9, 2019 at 6:2

2

Solved

Missing object or bucket in path when running on Dataflow

When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline ru...

google-cloud-dataflow apache-beam

Alary asked 31/5, 2017 at 17:36

3

Dataflow Pipeline - "Processing stuck in step <STEP_NAME> for at least <TIME> without outputting or completing in state finish..."

The Dataflow pipelines developed by my team suddenly started getting stuck, stopping processing our events. Their worker logs became full of warning messages saying that one specific step got stuck...

google-cloud-dataflow apache-beam

Coinstantaneous asked 4/3, 2019 at 19:39

1

Apache Beam 2.12.0 with Java 11 support?

Does Apache Beam 2.12.0 support Java 11, or should i go still stick with a stable Java 8 SDK as for now? I see the site recommends Python 3.5 with Beam 2.12.0 as per the documentation, compared to...

google-cloud-dataflow apache-beam

Gal asked 28/8, 2019 at 21:9

2

Solved

How to create a Dataflow pipeline from Pub/Sub to GCS in Python

I want to use Dataflow to move data from Pub/Sub to GCS. So basically I want Dataflow to accumulate some messages in a fixed amount of time (15 minutes for example), then write those data as text f...

python google-cloud-dataflow apache-beam google-cloud-pubsub

Harmless asked 18/2, 2019 at 11:7

1

Write parquet file with Snappy compression in Apache Beam

I am trying to write a parquet file as follow in Apache Beam using Snappy compression records.apply(FileIO.<GenericRecord>write().via(ParquetIO.sink(schema)).to(options.getOutput())); I se...

parquet apache-beam snappy

Lillalillard asked 29/11, 2018 at 16:28

1

Join 2 JSON inputs linked by Primary Key

I am trying to merge 2 JSON inputs (this example is from a file, but it will be from a Google Pub Sub input later) from these: orderID.json: {"orderID":"test1","orderPacked":"Yes","orderSubmitted...

python google-cloud-dataflow apache-beam

Damiandamiani asked 7/8, 2019 at 5:52

apache-beam Questions

Recommended topics

Hot tags