dataflow Questions

2

I'm using python beam on google dataflow, my pipeline looks like this: Read image urls from file >> Download images >> Process images The problem is that I can't let Download images step scale...
Yusuk asked 5/9, 2018 at 11:1

3

Solved

I'm trying this example for retrieve data from GCP Pub/Sub at DataFlow. import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.time.Instant; i...
Prepossess asked 27/9, 2018 at 3:11

3

I was reading the docs about SCHEMAS in Apache BEAM but i can not understand what its purpose is, how and why or in which cases should i need to use them. What is the difference between using schem...
Mai asked 16/6, 2020 at 16:13

1

I have a luigi task that performs some non-stable computations. Think of an optimization process that sometimes does not converge. import luigi MyOptimizer(luigi.Task): input_param: luigi.Param...
Afton asked 4/5, 2020 at 16:4

6

Solved

I want to paint a diagram where you can see the dataflow of a java program, and if there are one or multiple threads handling the data. Sequence charts don't show multithreading and get very confu...
Fadiman asked 14/6, 2011 at 14:53

6

Solved

I have an SSIS package with a Data Flow that takes an ADO.NET data source (just a small table), executes a select * query, and outputs the query results to a flat file (I've also tried just pulling...
Align asked 19/5, 2010 at 18:41

4

Solved

I try to set up controller service account for Dataflow. In my dataflow options I have: options.setGcpCredential(GoogleCredentials.fromStream( new FileInputStream("key.json")).createScop...

1

i am attempting to run a Dataflow job using Flex template, but i am getting stuck on a 'module not found error' and i cannot figure out why so here is the structure of my directory |__ modules |__...
Noneffective asked 6/6, 2021 at 9:27

2

Scenario - Running Dataflow jobs on project A using a shared VPC to use the region and subnetwork of host project B On the service account, I have following permission on both project A and B Compu...

3

Solved

I have a TransformManyBlock with the following design: Input: Path to a file Output: IEnumerable of the file's contents, one line at a time I am running this block on a huge file (61GB), which ...
Gladdie asked 23/6, 2015 at 5:31

1

Solved

Is it possible to get TransformManyBlocks to send intermediate results as they are created to the next step instead if waiting for the entire IEnumerable<T> to be filled? All testing I've d...
Hinckley asked 12/6, 2020 at 0:56

2

Solved

I have a requirement to make a scalable process. The process has mainly I/O operations with some minor CPU operations (mainly deserializing strings). The process query the database for a list of ur...
Harmonize asked 31/7, 2018 at 14:41

2

Solved

I'm doing a simple pipeline using Apache Beam in python (on GCP Dataflow) to read from PubSub and write on Big Query but can't handle exceptions on pipeline to create alternatives flows. On a sim...
Aesthetic asked 29/1, 2019 at 17:0

2

I'm trying to launch a Dataflow job on GCP using Apache Beam 0.6.0. I am compiling an uber jar using the shade plugin because I cannot launch the job using "mvn:execjava". I'm including this depend...
Appurtenant asked 21/3, 2017 at 20:13

3

What is the difference between Data Flow Analysis and Abstract Interpretation and are they used for the same purpose? What are the pros and cons of these two relative to each other.

2

We are currently in the process of deploying a new spring data flow stream application in our aws EKS cluster. As part of this, the pods launched by the skipper should have the IAM roles defined in...

3

Solved

I'm very newby with GCP and dataflow. However , I would like to start to test and deploy few flows harnessing dataflow on GCP. According to the documentation and everything around dataflow is imper...

0

I have a dataflow pipeline in which I consume pubsub messages, treat them, and then publish to pubsub. Whenever I have too many calculations (ie I increase the amount of treatment for each message...
Crosslegged asked 18/7, 2019 at 13:27

1

Solved

I have a general question on side inputs and broadcasting in the context of Apache Beam. Does any additional variables, lists, maps that are need for computation during processElement, need to be p...

1

To my knowledge, I should be able to use EnvironmentObject to observe & access model data from any view in the hierarchy. I have a view like this, where I display a list from an array that's in...
Polypary asked 8/6, 2019 at 17:30

1

Solved

We are currently working on a streaming pipeline on Apache Beam with DataflowRunner. We are reading messages from Pub/Sub and do some processing on them and afterwards we window them in slidings wi...
Septarium asked 20/2, 2019 at 10:30

3

Solved

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to ano...
Charcuterie asked 21/2, 2019 at 17:32

1

Solved

I'm setting up a slow-changing lookup Map in my Apache-Beam pipeline. It continuously updates the lookup map. For each key in lookup map, I retrieve the latest value in the global window with acc...
Morelos asked 29/1, 2019 at 13:46

1

Solved

Say in Dataflow/Apache Beam program, I am trying to read table which has data that is exponentially growing. I want to improve the performance of the read. BigQueryIO.Read.from("projectid:dataset....
Federative asked 29/1, 2019 at 4:4

1

Solved

My current understanding is that NiFi processor properties are specific to that processor. So adding a new property to a processor will only be visible within that processor and not be passed on to...
Regina asked 19/1, 2019 at 18:23

© 2022 - 2025 — McMap. All rights reserved.