apache-beam-io Questions

2

I am trying to run a apache beam pipeline in Dataflow runner; The job reads data from a bigquery table and write data to a database. I am running the job with classic template option in dataflow - ...
Concepcion asked 7/5, 2021 at 20:29

0

I'm trying out a simple example of reading data off a Kafka topic into Apache Beam. Here's the relevant snippet: with beam.Pipeline(options=pipeline_options) as pipeline: _ = ( pipeline | 'Read...
Vasoinhibitor asked 11/2, 2021 at 9:23

3

Is there a way to read a multi-line csv file using the ReadFromText transform in Python? I have a file that contains one line I am trying to make Apache Beam read the input as one line, but cannot ...

2

Solved

See below code snippet, I want ["metric1", "metric2"] to be my input for RunTask.process. However it was run twice with "metric1" and "metric2" respectively ...
Ungotten asked 23/7, 2020 at 8:38

2

Solved

I have an use case where, I read in the newline json elements stored in google cloud storage and start processing each json. While processing each json, I have to call an external API for doing de-...
Twenty asked 17/11, 2019 at 17:28

1

Solved

I'm setting up a slow-changing lookup Map in my Apache-Beam pipeline. It continuously updates the lookup map. For each key in lookup map, I retrieve the latest value in the global window with acc...
Morelos asked 29/1, 2019 at 13:46

2

Solved

BigQuery supports de-duplication for streaming insert. How can I use this feature using Apache Beam? https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency To help ensur...

1

Solved

I am using Apache-Beam to run some data transformation, which including data extraction from txt, csv, and different sources of data. One thing I noticed, is the difference of results when using be...
Hoff asked 24/12, 2018 at 11:35

6

Solved

I wanted to take advantage of the new BigQuery functionality of time partitioned tables, but am unsure this is currently possible in the 1.6 version of the Dataflow SDK. Looking at the BigQuery JS...
Trombidiasis asked 30/6, 2016 at 5:0

2

Solved

I have an apache-beam based dataflow job to read using vcf source from a single text file (stored in google cloud storage), transform text lines into datastore Entities and write them into the data...

1

Solved

I would like to make POST request through a DoFn for a Apache Beam Pipeline running on Dataflow. For that, I have created a client which instanciate an HttpClosableClient configured on a PoolingHt...
Ruby asked 28/11, 2017 at 21:21

1

Solved

I'm trying to read from pub/sub with the following code Read<String> pubsub = PubsubIO.<String>read().topic("projects/<projectId>/topics/<topic>").subscription("projects/&l...
Untouchable asked 16/5, 2017 at 13:58
1

© 2022 - 2025 — McMap. All rights reserved.